Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Skepticism and critical thinking is not panacea, but can help to understand the world better

Enterprise Linux sickness with overcomplexity:
slightly skeptical view on enterprise Linux distributions

News

Slightly Skeptical View on Enterprise Unix Administration

Recommended Books

Recommended Links Open source politics: IBM acquires Red Hat Backup and Restore LPI Certification
Oracle Linux Red Hat Suse Ubuntu Systemd invasion into Linux Server space Simple Unix Backup Tools Troubleshooing
Network Utilities ifconfig ethtool netstat Grub route Nmap
The Linux Logical Volume Manager Linux Disk Management Snapshots Virtual memory Linux Disk Management Startup and Shutdown Linux Multipath
Linux Run Levels Configuring serial console Root password recovery Booting into Rescue Mode Enterprise Job schedulers Grub Log Rotation
Linux Networking Linux Performance Tuning Biosdevname and renaming of Ethernet interfaces IPMI Is DevOps a yet another "for profit" technocult? Dell DRAC Security
Working with ISO Images in Linux Ministributions Admin Humor Admin Horror Stories Linux Tips Humor Etc

Introduction

Image a language in which both grammar and vocabulary is changing each three to five years.  And both are so huge that are beyond any normal human comprehension. You can learn some subset of both vocabulary and grammar when you closely work with a particular subsystem for several months in a row, only to forget it after a couple of months or quarters.  The classic example here is RHEL kickstart. 

In a sense all talks about linux security is a joke as you can't secure the OS, which is far, far beyond your ability to comprehend. So state-sponsored hackers will always have an edge in breaking into linux.

Linux became two complex to master for a single person. Now this is yet another monstrous OS, that nobody know well (as the level completely puts it far above mere mortal capabilities)   And that's the problem. Both Red Hat and Suse are now software development companies that can be called "overcomplexity junks". And it shows in their recent products.  Actually SLES is even worse then RHEL in this respect, despite being (originally) a German distribution. 

Generally in Linux administration (like previously in enterprise Unix administration) you get what you paid for. Nothing can replace multi-year experience, and experience often is acquired by making expensive mistakes (see Admin Horror Stories). Vendor training is expensive and is more or less available only to sysadmin in few industries (financial industry is one). For Red Hat we have the situation that closely resembles the situation well know from Solaris: training is rather good, but prices are exorbitant.

Due to the current complexity (or, more correctly, overcomplexity) of Linux environments most sysadmins can master it well only for commonly used subsystems and for just one flavor of Linux. Better one might be able to  support two (with highly asymmetrical level of skills, being usually considerably more proficient in one flavor over the other). In other words Unix wars are now replaced on Linux turf with vengeance.

The level of mental overload and frustration from the overcomplexity of two major enterprise Linux flavors (RHEL and SLES) is such that people are ready for a change. Note that in OS ecosystem there is a natural tendency toward monopoly -- nothing succeed like success and the critical mass of installation that those two "monstrously complex" Linux distribution hold prevent any escape. Especially in enterprise environment.  Red Hat can essentially dictate what linux should be -- as it did with incorporating systemd in RHEL 7. 

Still there is a large difference between RHEL and SLES popularity: 

Ubuntu -- a dumped-down Linux based on Debian, with some strange design decisions -- is now getting some corporate sales, especially in cloud environment,  the expense of Suse. It still mainly desktop OS but it gradually acquires some enterprise share two.  That makes the number of enterprise linux distribution close to what we used to have in commercial Unix space (Solaris, AIX and HP-UX) and Debian and Ubuntu playing  the role of Solaris.   

Troubles with SElinux

SLES until recently was slightly simpler then RHEL, as it did not include horribly complex security subsystem that RHEL uses -- SELinux.  It takes a lot of efforts to learn even basics of SELinux and configure properly one facing Internet server. Most sysadmin just use it blindly iether enabling it and disabling it without understanding any details of its functioning (or, more correctly, understanding it on the level allowing them to use common protocols, much like is the case with firewalls)

Actually it has a better solution in Linux-space used in SLES (AppArmor). Which was pretty elegant solution to a complex problem, if you ask me. But the critical mass of installation and m,arket share secured by Red Hat, made it "king of the hill" and prevented AppArmor from becoming Linux standard. A the result SUSE was forced to incorporate SELinux.

SELinux provides a Mandatory Access Control (MAC) system built into the Linux kernel (that is staff that labels things as "super secret", "secret" and "confidential" that three letter agencies are using to guard information). Historically Security Enhanced Linux (SELinux) was an open source project sponsored by the National Security Agency. Despite the user-friendly GUI, SELinux is difficult to configure and hard to understand. The documentation does not help much either. Most administrators are just turning SELinux subsystem off during the initial install but for Internet facing server you need to configure and use it, or...   And sometimes effects can be really subtle: for example you can login as root using password authentication but can't using passwordless ssh certificate.  That's why many complex applications, especially in HPC area explicidly recommend disabling SElinux as a starting point of installation. You can find articles on the WEB devoted to this topic. See for example

SELinux produces some very interesting errors, see for example http://bugs.mysql.com/bug.php?id=12676 and is not very compatible with some subsystems and complex applications.  Especially telling is the comment to the this blog post How to disable SELinux in RHEL 5:

Aeon said... @ May 13, 2008 2:34 PM
 
Thanks a million! I was dealing with a samba refusing to access the server shared folders. After about 2 hours of scrolling forums I found out the issue may be this shitty thing samba_selinux.

I usually disable it when I install, but this time I had to use the Dell utilities (no choice at all) and they enabled the thing. Disabled it your way, rebooted and it works as I wanted it. Thanks again!

SLES has one significant defect: by default it does not assign each user a unique group like RHEL does. But this can be fixed with a special wrapper for useradd command. In simplest for it can be just:

   #wrapper for useradd command
   # accepts two arguments: UID and user name, for example
   # uadd 3333 joedoers

   function uadd
   {
   groupadd -g $1 $2
   useradd -u $1 -g $1 -m $2
   }

Working closely with commercial Linuxes and seeing all their warts and such, one instantly understand that the traditional Open Source (GPL-based Open Source), is a very problematic business model. Historically (especially in case of Red Hat) is was used as a smoke screen for the VCs to get software engineers to work for free, not even for minimum wage, but for free! And grab as much money from suckers as they can, using all right words as an anesthetic. Essentially they take their hard work, pump $$$ in marketing and either sell the resulting company to one of their other portfolio companies or take it public and dump the shares on the public. Meanwhile the software engineers that worked to develop that software for free, aka slave labor, get $0.00 for their hard work while the VCs top brass of the startup and investment bankers make a killing.

And of course then they get their buddies in mainstream media hype the GPL-based Open Source development as the best thing after sliced bread.

Licensing

RHEL licensing is a mess too. In addition two higher level licenses are expensive and make Microsoft server license look very competitive.  Recently they went "IBM way" and started to change different prices for 4 socket servers: you can't just use two 2 socket licenses to license 4 socket server with their new registration-manager.  The next step will be classic IBM per core licensing; that's why so many people passionately hate IBM.

 There are three different types of licensing (let's call them patch-only, regular and with premium support). Each has several variations (for example HPC computational node is a variant of "patches only" license but does not provide GUI and many packages in repository). The level of tech support with the latter two (which are truly enterprise licenses) is very similar -- similarly dismal -- especially for complex problems, unless you press them really hard. 

In addition Red Hat people screwed their portal so much that you can't tell which server is assigned to what license. that situation improved with registration manger but new problem arise.

Generally the level of screw up of RHEL user portal is such, that there doubts that they can do anything useful in Linux space in the future, other then try to hold to their market share.

All is all while RHEL 6 is very complex but still a usable enterprise Linux distribution because if did not radically changed from RHEL 4, and 5. But it is not fan to use it, anymore. It's a pain. It's a headache. The same is true for SLES.

For RHEL 7 more strong words are applicable. 


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

2016 2015 2014 2013 2012 2011 2010 2009 2008
2007 2006 2005 2004 2003 2002 2001 2000 1999

[Dec 05, 2019] Life as a Linux system administrator Enable Sysadmin

System administration isn't easy nor is it for the thin-skinned.
Notable quotes:
"... System administration covers just about every aspect of hardware and software management for both physical and virtual systems. ..."
"... An SA's day is very full. In fact, you never really finish, but you have to pick a point in time to abandon your activities. Being an SA is a 24x7x365 job, which does take its toll on you physically and mentally. You'll hear a lot about burnout in this field. We, at Enable Sysadmin, have written several articles on the topic. ..."
"... You are the person who gets blamed when things go wrong and when things go right, it's "just part of your job." It's a tough place to be. ..."
"... Dealing with people is hard. Learn to breathe, smile, and comply if you want to survive and maintain your sanity. ..."
Dec 05, 2019 | www.redhat.com

... ... ...

What a Linux System Administrator does

A Linux system administrator wears many hats and the smaller your environment, the more hats you will wear. Linux administration covers backups, file restores, disaster recovery, new system builds, hardware maintenance, automation, user maintenance, filesystem housekeeping, application installation and configuration, system security management, and storage management. System administration covers just about every aspect of hardware and software management for both physical and virtual systems.

Oddly enough, you also need a broad knowledge base of network configuration, virtualization, interoperability, and yes, even Windows operating systems. A Linux system administrator needs to have some technical knowledge of network security, firewalls, databases, and all aspects of a working network. The reason is that, while you're primarily a Linux SA, you're also part of a larger support team that often must work together to solve complex problems.

Security, in some form or another, is often at the root of issues confronting a support team. A user might not have proper access or too much access. A daemon might not have the correct permissions to write to a log directory. A firewall exception hasn't been saved into the running configuration of a network appliance. There are hundreds of fail points in a network and your job is to help locate and resolve failures.

Linux system administration also requires that you stay on top of best practices, learn new software, maintain patches, read and comply with security notifications, and apply hardware updates. An SA's day is very full. In fact, you never really finish, but you have to pick a point in time to abandon your activities. Being an SA is a 24x7x365 job, which does take its toll on you physically and mentally. You'll hear a lot about burnout in this field. We, at Enable Sysadmin, have written several articles on the topic.

The hardest part of the job

Doing the technical stuff is relatively easy. It's dealing with people that makes the job really hard. That sounds terrible but it's true. On one side, you deal with your management, which is not always easy. You are the person who gets blamed when things go wrong and when things go right, it's "just part of your job." It's a tough place to be.

Coworkers don't seem to make life better for the SA. They should, but they often don't. You'll deal with lazy, unmotivated coworkers so often that you'll feel that you're carrying all the weight of the job yourself. Not all coworkers are bad. Some are helpful, diligent, proactive types and I've never had the pleasure of working with too many of them. It's hard to do your work and then take on the dubious responsibility of making sure everyone else does theirs as well.

And then there are users. Oh the bane of every SA's life, the end user. An SA friend of mine once said, "You know, this would be a great job if I just didn't have to interface with users." Agreed. But then again, with no users, there's probably also not a job. Dealing with computers is easy. Dealing with people is hard. Learn to breathe, smile, and comply if you want to survive and maintain your sanity.

... ... ...

[Nov 09, 2019] Mirroring a running system into a ramdisk Oracle Linux Blog

Nov 09, 2019 | blogs.oracle.com

javascript:void(0)

Mirroring a running system into a ramdisk Greg Marsden

In this blog post, Oracle Linux kernel developer William Roche presents a method to mirror a running system into a ramdisk.

A RAM mirrored System ?

There are cases where a system can boot correctly but after some time, can lose its system disk access - for example an iSCSI system disk configuration that has network issues, or any other disk driver problem. Once the system disk is no longer accessible, we rapidly face a hang situation followed by I/O failures, without the possibility of local investigation on this machine. I/O errors can be reported on the console:

 XFS (dm-0): Log I/O Error Detected....

Or losing access to basic commands like:

# ls
-bash: /bin/ls: Input/output error

The approach presented here allows a small system disk space to be mirrored in memory to avoid the above I/O failures situation, which provides the ability to investigate the reasons for the disk loss. The system disk loss will be noticed as an I/O hang, at which point there will be a transition to use only the ram-disk.

To enable this, the Oracle Linux developer Philip "Bryce" Copeland created the following method (more details will follow):

Disk and memory sizes:

As we are going to mirror the entire system installation to the memory, this system installation image has to fit in a fraction of the memory - giving enough memory room to hold the mirror image and necessary running space.

Of course this is a trade-off between the memory available to the server and the minimal disk size needed to run the system. For example a 12GB disk space can be used for a minimal system installation on a 16GB memory machine.

A standard Oracle Linux installation uses XFS as root fs, which (currently) can't be shrunk. In order to generate a usable "small enough" system, it is recommended to proceed to the OS installation on a correctly sized disk space. Of course, a correctly sized installation location can be created using partitions of large physical disk. Then, the needed application filesystems can be mounted from their current installation disk(s). Some system adjustments may also be required (services added, configuration changes, etc...).

This configuration phase should not be underestimated as it can be difficult to separate the system from the needed applications, and keeping both on the same space could be too large for a RAM disk mirroring.

The idea is not to keep an entire system load active when losing disks access, but to be able to have enough system to avoid system commands access failure and analyze the situation.

We are also going to avoid the use of swap. When the system disk access is lost, we don't want to require it for swap data. Also, we don't want to use more memory space to hold a swap space mirror. The memory is better used directly by the system itself.

The system installation can have a swap space (for example a 1.2GB space on our 12GB disk example) but we are neither going to mirror it nor use it.

Our 12GB disk example could be used with: 1GB /boot space, 11GB LVM Space (1.2GB swap volume, 9.8 GB root volume).

Ramdisk memory footprint:

The ramdisk size has to be a little larger (8M) than the root volume size that we are going to mirror, making room for metadata. But we can deal with 2 types of ramdisk:

We can expect roughly 30% to 50% memory space gain from zram compared to brd, but zram must use 4k I/O blocks only. This means that the filesystem used for root has to only deal with a multiple of 4k I/Os.

Basic commands:

Here is a simple list of commands to manually create and use a ramdisk and mirror the root filesystem space. We create a temporary configuration that needs to be undone or the subsequent reboot will not work. But we also provide below a way of automating at startup and shutdown.

Note the root volume size (considered to be ol/root in this example):

?
1 2 3 # lvs --units k -o lv_size ol/root LSize 10268672.00k

Create a ramdisk a little larger than that (at least 8M larger):

?
1 # modprobe brd rd_nr=1 rd_size=$((10268672 + 8*1024))

Verify the created disk:

?
1 2 3 # lsblk /dev/ram0 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT ram0 1:0 0 9.8G 0 disk

Put the disk under lvm control

?
1 2 3 4 5 6 7 8 9 # pvcreate /dev/ram0 Physical volume "/dev/ram0" successfully created. # vgextend ol /dev/ram0 Volume group "ol" successfully extended # vgscan --cache Reading volume groups from cache. Found volume group "ol" using metadata type lvm2 # lvconvert -y -m 1 ol/root /dev/ram0 Logical volume ol/root successfully converted.

We now have ol/root mirror to our /dev/ram0 disk.

?
1 2 3 4 5 6 7 8 # lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices root ol rwi-aor--- 9.79g 40.70 root_rimage_0(0),root_rimage_1(0) [root_rimage_0] ol iwi-aor--- 9.79g /dev/sda2(307) [root_rimage_1] ol Iwi-aor--- 9.79g /dev/ram0(1) [root_rmeta_0] ol ewi-aor--- 4.00m /dev/sda2(2814) [root_rmeta_1] ol ewi-aor--- 4.00m /dev/ram0(0) swap ol -wi-ao---- <1.20g /dev/sda2(0)

A few minutes (or seconds) later, the synchronization is completed:

?
1 2 3 4 5 6 7 8 # lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices root ol rwi-aor--- 9.79g 100.00 root_rimage_0(0),root_rimage_1(0) [root_rimage_0] ol iwi-aor--- 9.79g /dev/sda2(307) [root_rimage_1] ol iwi-aor--- 9.79g /dev/ram0(1) [root_rmeta_0] ol ewi-aor--- 4.00m /dev/sda2(2814) [root_rmeta_1] ol ewi-aor--- 4.00m /dev/ram0(0) swap ol -wi-ao---- <1.20g /dev/sda2(0)

We have our mirrored configuration running !

For security, we can also remove the swap and /boot, /boot/efi(if it exists) mount points:

?
1 2 3 # swapoff -a # umount /boot/efi # umount /boot

Stopping the system also requires some actions as you need to cleanup the configuration so that it will not be looking for a gone ramdisk on reboot.

?
1 2 3 4 5 6 7 # lvconvert -y -m 0 ol/root /dev/ram0 Logical volume ol/root successfully converted. # vgreduce ol /dev/ram0 Removed "/dev/ram0" from volume group "ol" # mount /boot # mount /boot/efi # swapon -a
What about in-memory compression ?

As indicated above, zRAM devices can compress data in-memory, but 2 main problems need to be fixed:

Make lvm work with zram:

The lvm configuration file has to be changed to take into account the "zram" type of devices. Including the following "types" entry to the /etc/lvm/lvm.conf file in its "devices" section:

?
1 2 3 devices { types = [ "zram" , 16 ] }
Root file system I/Os:

A standard Oracle Linux installation uses XFS, and we can check the sector size used (depending on the disk type used) with

?
1 2 3 4 5 6 7 8 9 10 # xfs_info / meta-data=/dev/mapper/ol-root isize=256 agcount=4, agsize=641792 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 finobt=0 spinodes=0 data = bsize=4096 blocks=2567168, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0

We can notice here that the sector size (sectsz) used on this root fs is a standard 512 bytes. This fs type cannot be mirrored with a zRAM device, and needs to be recreated with 4k sector sizes.

Transforming the root file system to 4k sector size:

This is simply a backup (to a zram disk) and restore procedure after recreating the root FS. To do so, the system has to be booted from another system image. Booting from an installation DVD image can be a good possibility.

?
1 2 3 sh-4.2 # vgchange -a y ol 2 logical volume(s) in volume group "ol" now active sh-4.2 # mount /dev/mapper/ol-root /mnt
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 sh-4.2 # modprobe zram sh-4.2 # echo 10G > /sys/block/zram0/disksize sh-4.2 # mkfs.xfs /dev/zram0 meta-data=/dev/zram0 isize=256 agcount=4, agsize=655360 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0, sparse=0 data = bsize=4096 blocks=2621440, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 sh-4.2 # mkdir /mnt2 sh-4.2 # mount /dev/zram0 /mnt2 sh-4.2 # xfsdump -L BckUp -M dump -f /mnt2/ROOT /mnt xfsdump: using file dump (drive_simple) strategy xfsdump: version 3.1.7 (dump format 3.0) - type ^C for status and control xfsdump: level 0 dump of localhost:/mnt ... xfsdump: dump complete: 130 seconds elapsed xfsdump: Dump Summary: xfsdump: stream 0 /mnt2/ROOT OK (success) xfsdump: Dump Status: SUCCESS sh-4.2 # umount /mnt
?
1 2 3 4 5 6 7 8 9 10 11 12 sh-4.2 # mkfs.xfs -f -s size=4096 /dev/mapper/ol-root meta-data=/dev/mapper/ol-root isize=256 agcount=4, agsize=641792 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0, sparse=0 data = bsize=4096 blocks=2567168, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 sh-4.2 # mount /dev/mapper/ol-root /mnt
?
1 2 3 4 5 6 7 8 9 10 11 sh-4.2 # xfsrestore -f /mnt2/ROOT /mnt xfsrestore: using file dump (drive_simple) strategy xfsrestore: version 3.1.7 (dump format 3.0) - type ^C for status and control xfsrestore: searching media for dump ... xfsrestore: restore complete: 337 seconds elapsed xfsrestore: Restore Summary: xfsrestore: stream 0 /mnt2/ROOT OK (success) xfsrestore: Restore Status: SUCCESS sh-4.2 # umount /mnt sh-4.2 # umount /mnt2
?
1 sh-4.2 # reboot
?
1 2 3 4 5 6 7 8 9 10 $ xfs_info / meta-data=/dev/mapper/ol-root isize=256 agcount=4, agsize=641792 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0 spinodes=0 data = bsize=4096 blocks=2567168, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0

With sectsz=4096, our system is now ready for zRAM mirroring.

Basic commands with a zRAM device: ?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 # modprobe zram # zramctl --find --size 10G /dev/zram0 # pvcreate /dev/zram0 Physical volume "/dev/zram0" successfully created. # vgextend ol /dev/zram0 Volume group "ol" successfully extended # vgscan --cache Reading volume groups from cache. Found volume group "ol" using metadata type lvm2 # lvconvert -y -m 1 ol/root /dev/zram0 Logical volume ol/root successfully converted. # lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices root ol rwi-aor--- 9.79g 12.38 root_rimage_0(0),root_rimage_1(0) [root_rimage_0] ol iwi-aor--- 9.79g /dev/sda2(307) [root_rimage_1] ol Iwi-aor--- 9.79g /dev/zram0(1) [root_rmeta_0] ol ewi-aor--- 4.00m /dev/sda2(2814) [root_rmeta_1] ol ewi-aor--- 4.00m /dev/zram0(0) swap ol -wi-ao---- <1.20g /dev/sda2(0) # lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices root ol rwi-aor--- 9.79g 100.00 root_rimage_0(0),root_rimage_1(0) [root_rimage_0] ol iwi-aor--- 9.79g /dev/sda2(307) [root_rimage_1] ol iwi-aor--- 9.79g /dev/zram0(1) [root_rmeta_0] ol ewi-aor--- 4.00m /dev/sda2(2814) [root_rmeta_1] ol ewi-aor--- 4.00m /dev/zram0(0) swap ol -wi-ao---- <1.20g /dev/sda2(0) # zramctl NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram0 lzo 10G 9.8G 5.3G 5.5G 1

The compressed disk uses a total of 5.5GB of memory to mirror a 9.8G volume size (using in this case 8.5G).

Removal is performed the same way as brd, except that the device is /dev/zram0 instead of /dev/ram0.

Automating the process:

Fortunately, the procedure can be automated on system boot and shutdown with the following scripts (given as examples).

The start method: /usr/sbin/start-raid1-ramdisk: [ https://github.com/oracle/linux-blog-sample-code/blob/ramdisk-system-image/start-raid1-ramdisk ]

After a chmod 555 /usr/sbin/start-raid1-ramdisk, running this script on a 4k xfs root file system should show something like:

?
1 2 3 4 5 6 7 8 9 10 11 # /usr/sbin/start-raid1-ramdisk Volume group "ol" is already consistent. RAID1 ramdisk: intending to use 10276864 K of memory for facilitation of [ / ] Physical volume "/dev/zram0" successfully created. Volume group "ol" successfully extended Logical volume ol/root successfully converted. Waiting for mirror to synchronize... LVM RAID1 sync of [ / ] took 00:01:53 sec Logical volume ol/root changed. NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram0 lz4 9.8G 9.8G 5.5G 5.8G 1

The stop method: /usr/sbin/stop-raid1-ramdisk: [ https://github.com/oracle/linux-blog-sample-code/blob/ramdisk-system-image/stop-raid1-ramdisk ]

After a chmod 555 /usr/sbin/stop-raid1-ramdisk, running this script should show something like:

?
1 2 3 4 5 6 # /usr/sbin/stop-raid1-ramdisk Volume group "ol" is already consistent. Logical volume ol/root changed. Logical volume ol/root successfully converted. Removed "/dev/zram0" from volume group "ol" Labels on physical volume "/dev/zram0" successfully wiped.

A service Unit file can also be created: /etc/systemd/system/raid1-ramdisk.service [https://github.com/oracle/linux-blog-sample-code/blob/ramdisk-system-image/raid1-ramdisk.service]

?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [Unit] Description=Enable RAMdisk RAID 1 on LVM After= local -fs.target Before= shutdown .target reboot.target halt.target [Service] ExecStart=/usr/sbin/start-raid1-ramdisk ExecStop=/usr/sbin/stop-raid1-ramdisk Type=oneshot RemainAfterExit= yes TimeoutSec=0 [Install] WantedBy=multi-user.target
Conclusion:

When the system disk access problem manifests itself, the ramdisk mirror branch will provide the possibility to investigate the situation. This procedure goal is not to keep the system running on this memory mirror configuration, but help investigate a bad situation.

When the problem is identified and fixed, I really recommend to come back to a standard configuration -- enjoying the entire memory of the system, a standard system disk, a possible swap space etc.

Hoping the method described here can help. I also want to thank for their reviews Philip "Bryce" Copeland who also created the first prototype of the above scripts, and Mark Kanda who also helped testing many aspects of this work.

[Nov 09, 2019] chkservice Is A systemd Unit Manager With A Terminal User Interface

The site is https://github.com/linuxenko/chkservice The tool is written in C++
Looks like in version 0.3 the author increased the complexity by adding features which probably are not needed at all
Nov 07, 2019 | www.linuxuprising.com

chkservice systemd manager
chkservice, a terminal user interface (TUI) for managing systemd units, has been updated recently with window resize and search support.

chkservice is a simplistic systemd unit manager that uses ncurses for its terminal interface. Using it you can enable or disable, and start or stop a systemd unit. It also shows the units status (enabled, disabled, static or masked).

You can navigate the chkservice user interface using keyboard shortcuts:

To enable or disable a unit press Space , and to start or stop a unity press s . You can access the help screen which shows all available keys by pressing ? .

The command line tool had its first release in August 2017, with no new releases until a few days ago when version 0.2 was released, quickly followed by 0.3.

With the latest 0.3 release, chkservice adds a search feature that allows easily searching through all systemd units.

To search, type / followed by your search query, and press Enter . To search for the next item matching your search query you'll have to type / again, followed by Enter or Ctrl + m (without entering any search text).

Another addition to the latest chkservice is window resize support. In the 0.1 version, the tool would close when the user tried to resize the terminal window. That's no longer the case now, chkservice allowing the resize of the terminal window it runs in.

And finally, the last addition to the latest chkservice 0.3 is G-g navigation support . Press G ( Shift + g ) to navigate to the bottom, and g to navigate to the top.

Download and install chkservice

The initial (0.1) chkservice version can be found in the official repositories of a few Linux distributions, including Debian and Ubuntu (and Debian or Ubuntu based Linux distribution -- e.g. Linux Mint, Pop!_OS, Elementary OS and so on).

There are some third-party repositories available as well, including a Fedora Copr, Ubuntu / Linux Mint PPA, and Arch Linux AUR, but at the time I'm writing this, only the AUR package was updated to the latest chkservice version 0.3.

You may also install chkservice from source. Use the instructions provided in the tool's readme to either create a DEB package or install it directly.

[Nov 08, 2019] How to use cron in Linux by David Both

Nov 06, 2017 | opensource.com
No time for commands? Scheduling tasks with cron means programs can run but you don't have to stay up late. 9 comments Image by : Internet Archive Book Images. Modified by Opensource.com. CC BY-SA 4.0 x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0

Instead, I use two service utilities that allow me to run commands, programs, and tasks at predetermined times. The cron and at services enable sysadmins to schedule tasks to run at a specific time in the future. The at service specifies a one-time task that runs at a certain time. The cron service can schedule tasks on a repetitive basis, such as daily, weekly, or monthly.

In this article, I'll introduce the cron service and how to use it.

Common (and uncommon) cron uses

I use the cron service to schedule obvious things, such as regular backups that occur daily at 2 a.m. I also use it for less obvious things.

The crond daemon is the background service that enables cron functionality.

The cron service checks for files in the /var/spool/cron and /etc/cron.d directories and the /etc/anacrontab file. The contents of these files define cron jobs that are to be run at various intervals. The individual user cron files are located in /var/spool/cron , and system services and applications generally add cron job files in the /etc/cron.d directory. The /etc/anacrontab is a special case that will be covered later in this article.

Using crontab

The cron utility runs based on commands specified in a cron table ( crontab ). Each user, including root, can have a cron file. These files don't exist by default, but can be created in the /var/spool/cron directory using the crontab -e command that's also used to edit a cron file (see the script below). I strongly recommend that you not use a standard editor (such as Vi, Vim, Emacs, Nano, or any of the many other editors that are available). Using the crontab command not only allows you to edit the command, it also restarts the crond daemon when you save and exit the editor. The crontab command uses Vi as its underlying editor, because Vi is always present (on even the most basic of installations).

New cron files are empty, so commands must be added from scratch. I added the job definition example below to my own cron files, just as a quick reference, so I know what the various parts of a command mean. Feel free to copy it for your own use.

# crontab -e
SHELL = / bin / bash
MAILTO =root @ example.com
PATH = / bin: / sbin: / usr / bin: / usr / sbin: / usr / local / bin: / usr / local / sbin

# For details see man 4 crontabs

# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# | | | | |
# * * * * * user-name command to be executed

# backup using the rsbu program to the internal 4TB HDD and then 4TB external
01 01 * * * / usr / local / bin / rsbu -vbd1 ; / usr / local / bin / rsbu -vbd2

# Set the hardware clock to keep it in sync with the more accurate system clock
03 05 * * * / sbin / hwclock --systohc

# Perform monthly updates on the first of the month
# 25 04 1 * * /usr/bin/dnf -y update

The crontab command is used to view or edit the cron files.

The first three lines in the code above set up a default environment. The environment must be set to whatever is necessary for a given user because cron does not provide an environment of any kind. The SHELL variable specifies the shell to use when commands are executed. This example specifies the Bash shell. The MAILTO variable sets the email address where cron job results will be sent. These emails can provide the status of the cron job (backups, updates, etc.) and consist of the output you would see if you ran the program manually from the command line. The third line sets up the PATH for the environment. Even though the path is set here, I always prepend the fully qualified path to each executable.

There are several comment lines in the example above that detail the syntax required to define a cron job. I'll break those commands down, then add a few more to show you some more advanced capabilities of crontab files.

01 01 * * * /usr/local/bin/rsbu -vbd1 ; /usr/local/bin/rsbu -vbd2

This line in my /etc/crontab runs a script that performs backups for my systems.

This line runs my self-written Bash shell script, rsbu , that backs up all my systems. This job kicks off at 1:01 a.m. (01 01) every day. The asterisks (*) in positions three, four, and five of the time specification are like file globs, or wildcards, for other time divisions; they specify "every day of the month," "every month," and "every day of the week." This line runs my backups twice; one backs up to an internal dedicated backup hard drive, and the other backs up to an external USB drive that I can take to the safe deposit box.

The following line sets the hardware clock on the computer using the system clock as the source of an accurate time. This line is set to run at 5:03 a.m. (03 05) every day.

03 05 * * * /sbin/hwclock --systohc

This line sets the hardware clock using the system time as the source.

I was using the third and final cron job (commented out) to perform a dnf or yum update at 04:25 a.m. on the first day of each month, but I commented it out so it no longer runs.

# 25 04 1 * * /usr/bin/dnf -y update

This line used to perform a monthly update, but I've commented it out.

Other scheduling tricks

Now let's do some things that are a little more interesting than these basics. Suppose you want to run a particular job every Thursday at 3 p.m.:

00 15 * * Thu /usr/local/bin/mycronjob.sh

This line runs mycronjob.sh every Thursday at 3 p.m.

Or, maybe you need to run quarterly reports after the end of each quarter. The cron service has no option for "The last day of the month," so instead you can use the first day of the following month, as shown below. (This assumes that the data needed for the reports will be ready when the job is set to run.)

02 03 1 1,4,7,10 * /usr/local/bin/reports.sh

This cron job runs quarterly reports on the first day of the month after a quarter ends.

The following shows a job that runs one minute past every hour between 9:01 a.m. and 5:01 p.m.

01 09-17 * * * /usr/local/bin/hourlyreminder.sh

Sometimes you want to run jobs at regular times during normal business hours.

I have encountered situations where I need to run a job every two, three, or four hours. That can be accomplished by dividing the hours by the desired interval, such as */3 for every three hours, or 6-18/3 to run every three hours between 6 a.m. and 6 p.m. Other intervals can be divided similarly; for example, the expression */15 in the minutes position means "run the job every 15 minutes."

*/5 08-18/2 * * * /usr/local/bin/mycronjob.sh

This cron job runs every five minutes during every hour between 8 a.m. and 5:58 p.m.

One thing to note: The division expressions must result in a remainder of zero for the job to run. That's why, in this example, the job is set to run every five minutes (08:05, 08:10, 08:15, etc.) during even-numbered hours from 8 a.m. to 6 p.m., but not during any odd-numbered hours. For example, the job will not run at all from 9 p.m. to 9:59 a.m.

I am sure you can come up with many other possibilities based on these examples.

Limiting cron access

More Linux resources

Regular users with cron access could make mistakes that, for example, might cause system resources (such as memory and CPU time) to be swamped. To prevent possible misuse, the sysadmin can limit user access by creating a /etc/cron.allow file that contains a list of all users with permission to create cron jobs. The root user cannot be prevented from using cron.

By preventing non-root users from creating their own cron jobs, it may be necessary for root to add their cron jobs to the root crontab. "But wait!" you say. "Doesn't that run those jobs as root?" Not necessarily. In the first example in this article, the username field shown in the comments can be used to specify the user ID a job is to have when it runs. This prevents the specified non-root user's jobs from running as root. The following example shows a job definition that runs a job as the user "student":

04 07 * * * student /usr/local/bin/mycronjob.sh

If no user is specified, the job is run as the user that owns the crontab file, root in this case.

cron.d

The directory /etc/cron.d is where some applications, such as SpamAssassin and sysstat , install cron files. Because there is no spamassassin or sysstat user, these programs need a place to locate cron files, so they are placed in /etc/cron.d .

The /etc/cron.d/sysstat file below contains cron jobs that relate to system activity reporting (SAR). These cron files have the same format as a user cron file.

# Run system activity accounting tool every 10 minutes
*/ 10 * * * * root / usr / lib64 / sa / sa1 1 1
# Generate a daily summary of process accounting at 23:53
53 23 * * * root / usr / lib64 / sa / sa2 -A

The sysstat package installs the /etc/cron.d/sysstat cron file to run programs for SAR.

The sysstat cron file has two lines that perform tasks. The first line runs the sa1 program every 10 minutes to collect data stored in special binary files in the /var/log/sa directory. Then, every night at 23:53, the sa2 program runs to create a daily summary.

Scheduling tips

Some of the times I set in the crontab files seem rather random -- and to some extent they are. Trying to schedule cron jobs can be challenging, especially as the number of jobs increases. I usually have only a few tasks to schedule on each of my computers, which is simpler than in some of the production and lab environments where I have worked.

One system I administered had around a dozen cron jobs that ran every night and an additional three or four that ran on weekends or the first of the month. That was a challenge, because if too many jobs ran at the same time -- especially the backups and compiles -- the system would run out of RAM and nearly fill the swap file, which resulted in system thrashing while performance tanked, so nothing got done. We added more memory and improved how we scheduled tasks. We also removed a task that was very poorly written and used large amounts of memory.

The crond service assumes that the host computer runs all the time. That means that if the computer is turned off during a period when cron jobs were scheduled to run, they will not run until the next time they are scheduled. This might cause problems if they are critical cron jobs. Fortunately, there is another option for running jobs at regular intervals: anacron .

anacron

The anacron program performs the same function as crond, but it adds the ability to run jobs that were skipped, such as if the computer was off or otherwise unable to run the job for one or more cycles. This is very useful for laptops and other computers that are turned off or put into sleep mode.

As soon as the computer is turned on and booted, anacron checks to see whether configured jobs missed their last scheduled run. If they have, those jobs run immediately, but only once (no matter how many cycles have been missed). For example, if a weekly job was not run for three weeks because the system was shut down while you were on vacation, it would be run soon after you turn the computer on, but only once, not three times.

The anacron program provides some easy options for running regularly scheduled tasks. Just install your scripts in the /etc/cron.[hourly|daily|weekly|monthly] directories, depending how frequently they need to be run.

How does this work? The sequence is simpler than it first appears.

  1. The crond service runs the cron job specified in /etc/cron.d/0hourly .
# Run the hourly jobs
SHELL = / bin / bash
PATH = / sbin: / bin: / usr / sbin: / usr / bin
MAILTO =root
01 * * * * root run-parts / etc / cron.hourly

The contents of /etc/cron.d/0hourly cause the shell scripts located in /etc/cron.hourly to run.

  1. The cron job specified in /etc/cron.d/0hourly runs the run-parts program once per hour.
  2. The run-parts program runs all the scripts located in the /etc/cron.hourly directory.
  3. The /etc/cron.hourly directory contains the 0anacron script, which runs the anacron program using the /etdc/anacrontab configuration file shown here.
# /etc/anacrontab: configuration file for anacron

# See anacron(8) and anacrontab(5) for details.

SHELL = / bin / sh
PATH = / sbin: / bin: / usr / sbin: / usr / bin
MAILTO =root
# the maximal random delay added to the base delay of the jobs
RANDOM_DELAY = 45
# the jobs will be started during the following hours only
START_HOURS_RANGE = 3 - 22

#period in days delay in minutes job-identifier command
1 5 cron.daily nice run-parts / etc / cron.daily
7 25 cron.weekly nice run-parts / etc / cron.weekly
@ monthly 45 cron.monthly nice run-parts / etc / cron.monthly

The contents of /etc/anacrontab file runs the executable files in the cron.[daily|weekly|monthly] directories at the appropriate times.

  1. The anacron program runs the programs located in /etc/cron.daily once per day; it runs the jobs located in /etc/cron.weekly once per week, and the jobs in cron.monthly once per month. Note the specified delay times in each line that help prevent these jobs from overlapping themselves and other cron jobs.

Instead of placing complete Bash programs in the cron.X directories, I install them in the /usr/local/bin directory, which allows me to run them easily from the command line. Then I add a symlink in the appropriate cron directory, such as /etc/cron.daily .

The anacron program is not designed to run programs at specific times. Rather, it is intended to run programs at intervals that begin at the specified times, such as 3 a.m. (see the START_HOURS_RANGE line in the script just above) of each day, on Sunday (to begin the week), and on the first day of the month. If any one or more cycles are missed, anacron will run the missed jobs once, as soon as possible.

More on setting limits

I use most of these methods for scheduling tasks to run on my computers. All those tasks are ones that need to run with root privileges. It's rare in my experience that regular users really need a cron job. One case was a developer user who needed a cron job to kick off a daily compile in a development lab.

It is important to restrict access to cron functions by non-root users. However, there are circumstances when a user needs to set a task to run at pre-specified times, and cron can allow them to do that. Many users do not understand how to properly configure these tasks using cron and they make mistakes. Those mistakes may be harmless, but, more often than not, they can cause problems. By setting functional policies that cause users to interact with the sysadmin, individual cron jobs are much less likely to interfere with other users and other system functions.

It is possible to set limits on the total resources that can be allocated to individual users or groups, but that is an article for another time.

For more information, the man pages for cron , crontab , anacron , anacrontab , and run-parts all have excellent information and descriptions of how the cron system works.


Ben Cotton on 06 Nov 2017 Permalink

One problem I used to have in an old job was cron jobs that would hang for some reason. This old sysadvent post had some good suggestions for how to deal with that: http://sysadvent.blogspot.com/2009/12/cron-practices.html

Jesper Larsen on 06 Nov 2017 Permalink

Cron is definitely a good tool. But if you need to do more advanced scheduling then Apache Airflow is great for this.

Airflow has a number of advantages over Cron. The most important are: Dependencies (let tasks run after other tasks), nice web based overview, automatic failure recovery and a centralized scheduler. The disadvantages are that you will need to setup the scheduler and some other centralized components on one server and a worker on each machine you want to run stuff on.

You definitely want to use Cron for some stuff. But if you find that Cron is too limited for your use case I would recommend looking into Airflow.

Leslle Satenstein on 13 Nov 2017 Permalink

Hi David,
you have a well done article. Much appreciated. I make use of the @reboot crontab entry. With crontab and root. I run the following.

@reboot /bin/dofstrim.sh

I wanted to run fstrim for my SSD drive once and only once per week.
dofstrim.sh is a script that runs the "fstrim" program once per week, irrespective of the number of times the system is rebooted. I happen to have several Linux systems sharing one computer, and each system has a root crontab with that entry. Since I may hop from Linux to Linux in the day or several times per week, my dofstrim.sh only runs fstrim once per week, irrespective which Linux system I boot. I make use of a common partition to all Linux systems, a partition mounted as "/scratch" and the wonderful Linux command line "date" program.

The dofstrim.sh listing follows below.

#!/bin/bash
# run fstrim either once/week or once/day not once for every reboot
#
# Use the date function to extract today's day number or week number
# the day number range is 1..366, weekno is 1 to 53
#WEEKLY=0 #once per day
WEEKLY=1 #once per week
lockdir='/scratch/lock/'

if [[ WEEKLY -eq 1 ]]; then
dayno="$lockdir/dofstrim.weekno"
today=$(date +%V)
else
dayno=$lockdir/dofstrim.dayno
today=$(date +%j)
fi

prevval="000"

if [ -f "$dayno" ]
then
prevval=$(cat ${dayno} )
if [ x$prevval = x ];then
prevval="000"
fi
else
mkdir -p $lockdir
fi

if [ ${prevval} -ne ${today} ]
then
/sbin/fstrim -a
echo $today > $dayno
fi

I had thought to use anacron, but then fstrim would be run frequently as each linux's anacron would have a similar entry.
The "date" program produces a day number or a week number, depending upon the +%V or +%j

Leslle Satenstein on 13 Nov 2017 Permalink

Running a report on the last day of the month is easy if you use the date program. Use the date function from Linux as shown

*/9 15 28-31 * * [ `date -d +'1 day' +\%d` -eq 1 ] && echo "Tomorrow is the first of month Today(now) is `date`" >> /root/message

Once per day from the 28th to the 31st, the date function is executed.
If the result of date +1day is the first of the month, today must be the last day of the month.

sgtrock on 14 Nov 2017 Permalink

Why not use crontab to launch something like Ansible playbooks instead of simple bash scripts? A lot easier to troubleshoot and manage these days. :-)

[Nov 08, 2019] Bash aliases you can't live without by Seth Kenlon

Jul 31, 2019 | opensource.com

Tired of typing the same long commands over and over? Do you feel inefficient working on the command line? Bash aliases can make a world of difference. 28 comments

A Bash alias is a method of supplementing or overriding Bash commands with new ones. Bash aliases make it easy for users to customize their experience in a POSIX terminal. They are often defined in $HOME/.bashrc or $HOME/bash_aliases (which must be loaded by $HOME/.bashrc ).

Most distributions add at least some popular aliases in the default .bashrc file of any new user account. These are simple ones to demonstrate the syntax of a Bash alias:

alias ls = 'ls -F'
alias ll = 'ls -lh'

Not all distributions ship with pre-populated aliases, though. If you add aliases manually, then you must load them into your current Bash session:

$ source ~/.bashrc

Otherwise, you can close your terminal and re-open it so that it reloads its configuration file.

With those aliases defined in your Bash initialization script, you can then type ll and get the results of ls -l , and when you type ls you get, instead of the output of plain old ls .

Those aliases are great to have, but they just scratch the surface of what's possible. Here are the top 10 Bash aliases that, once you try them, you won't be able to live without.

Set up first

Before beginning, create a file called ~/.bash_aliases :

$ touch ~/.bash_aliases

Then, make sure that this code appears in your ~/.bashrc file:

if [ -e $HOME / .bash_aliases ] ; then
source $HOME / .bash_aliases
fi

If you want to try any of the aliases in this article for yourself, enter them into your .bash_aliases file, and then load them into your Bash session with the source ~/.bashrc command.

Sort by file size

If you started your computing life with GUI file managers like Nautilus in GNOME, the Finder in MacOS, or Explorer in Windows, then you're probably used to sorting a list of files by their size. You can do that in a terminal as well, but it's not exactly succinct.

Add this alias to your configuration on a GNU system:

alias lt = 'ls --human-readable --size -1 -S --classify'

This alias replaces lt with an ls command that displays the size of each item, and then sorts it by size, in a single column, with a notation to indicate the kind of file. Load your new alias, and then try it out:

$ source ~ / .bashrc
$ lt
total 344K
140K configure *
44K aclocal.m4
36K LICENSE
32K config.status *
24K Makefile
24K Makefile.in
12K config.log
8.0K README.md
4.0K info.slackermedia.Git-portal.json
4.0K git-portal.spec
4.0K flatpak.path.patch
4.0K Makefile.am *
4.0K dot-gitlab.ci.yml
4.0K configure.ac *
0 autom4te.cache /
0 share /
0 bin /
0 install-sh @
0 compile @
0 missing @
0 COPYING @

On MacOS or BSD, the ls command doesn't have the same options, so this alias works instead:

alias lt = 'du -sh * | sort -h'

The results of this version are a little different:

$ du -sh * | sort -h
0 compile
0 COPYING
0 install-sh
0 missing
4.0K configure.ac
4.0K dot-gitlab.ci.yml
4.0K flatpak.path.patch
4.0K git-portal.spec
4.0K info.slackermedia.Git-portal.json
4.0K Makefile.am
8.0K README.md
12K config.log
16K bin
24K Makefile
24K Makefile.in
32K config.status
36K LICENSE
44K aclocal.m4
60K share
140K configure
476K autom4te.cache

In fact, even on Linux, that command is useful, because using ls lists directories and symlinks as being 0 in size, which may not be the information you actually want. It's your choice.

Thanks to Brad Alexander for this alias idea.

View only mounted drives

The mount command used to be so simple. With just one command, you could get a list of all the mounted filesystems on your computer, and it was frequently used for an overview of what drives were attached to a workstation. It used to be impressive to see more than three or four entries because most computers don't have many more USB ports than that, so the results were manageable.

Computers are a little more complicated now, and between LVM, physical drives, network storage, and virtual filesystems, the results of mount can be difficult to parse:

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=8131024k,nr_inodes=2032756,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
[...]
/dev/nvme0n1p2 on /boot type ext4 (rw,relatime,seclabel)
/dev/nvme0n1p1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)
[...]
gvfsd-fuse on /run/user/100977/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=100977,group_id=100977)
/dev/sda1 on /run/media/seth/pocket type ext4 (rw,nosuid,nodev,relatime,seclabel,uhelper=udisks2)
/dev/sdc1 on /run/media/seth/trip type ext4 (rw,nosuid,nodev,relatime,seclabel,uhelper=udisks2)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)

To solve that problem, try an alias like this:

alias mnt = "mount | awk -F' ' '{ printf \" %s \t %s \n\" , \$ 1, \$ 3; }' | column -t | egrep ^/dev/ | sort"

This alias uses awk to parse the output of mount by column, reducing the output to what you probably looking for (what hard drives, and not file systems, are mounted):

$ mnt
/dev/mapper/fedora-root /
/dev/nvme0n1p1 /boot/efi
/dev/nvme0n1p2 /boot
/dev/sda1 /run/media/seth/pocket
/dev/sdc1 /run/media/seth/trip

On MacOS, the mount command doesn't provide terribly verbose output, so an alias may be overkill. However, if you prefer a succinct report, try this:

alias mnt = 'mount | grep -E ^/dev | column -t'

The results:

$ mnt
/dev/disk1s1 on / (apfs, local, journaled)
/dev/disk1s4 on /private/var/vm (apfs, local, noexec, journaled, noatime, nobrowse) Find a command in your grep history

Sometimes you figure out how to do something in the terminal, and promise yourself that you'll never forget what you've just learned. Then an hour goes by, and you've completely forgotten what you did.

Searching through your Bash history is something everyone has to do from time to time. If you know exactly what you're searching for, you can use Ctrl+R to do a reverse search through your history, but sometimes you can't remember the exact command you want to find.

Here's an alias to make that task a little easier:

alias gh = 'history|grep'

Here's an example of how to use it:

$ gh bash
482 cat ~/.bashrc | grep _alias
498 emacs ~/.bashrc
530 emacs ~/.bash_aliases
531 source ~/.bashrc Sort by modification time

It happens every Monday: You get to work, you sit down at your computer, you open a terminal, and you find you've forgotten what you were doing last Friday. What you need is an alias to list the most recently modified files.

You can use the ls command to create an alias to help you find where you left off:

alias left = 'ls -t -1'

The output is simple, although you can extend it with the -- long option if you prefer. The alias, as listed, displays this:

$ left
demo.jpeg
demo.xcf
design-proposal.md
rejects.txt
brainstorm.txt
query-letter.xml Count files

If you need to know how many files you have in a directory, the solution is one of the most classic examples of UNIX command construction: You list files with the ls command, control its output to be only one column with the -1 option, and then pipe that output to the wc (word count) command to count how many lines of single files there are.

It's a brilliant demonstration of how the UNIX philosophy allows users to build their own solutions using small system components. This command combination is also a lot to type if you happen to do it several times a day, and it doesn't exactly work for a directory of directories without using the -R option, which introduces new lines to the output and renders the exercise useless.

Instead, this alias makes the process easy:

alias count = 'find . -type f | wc -l'

This one counts files, ignoring directories, but not the contents of directories. If you have a project folder containing two directories, each of which contains two files, the alias returns four, because there are four files in the entire project.

$ ls
foo bar
$ count
4 Create a Python virtual environment

Do you code in Python?

Do you code in Python a lot?

If you do, then you know that creating a Python virtual environment requires, at the very least, 53 keystrokes.
That's 49 too many, but that's easily circumvented with two new aliases called ve and va :

alias ve = 'python3 -m venv ./venv'
alias va = 'source ./venv/bin/activate'

Running ve creates a new directory, called venv , containing the usual virtual environment filesystem for Python3. The va alias activates the environment in your current shell:

$ cd my-project
$ ve
$ va
(venv) $ Add a copy progress bar

Everybody pokes fun at progress bars because they're infamously inaccurate. And yet, deep down, we all seem to want them. The UNIX cp command has no progress bar, but it does have a -v option for verbosity, meaning that it echoes the name of each file being copied to your terminal. That's a pretty good hack, but it doesn't work so well when you're copying one big file and want some indication of how much of the file has yet to be transferred.

The pv command provides a progress bar during copy, but it's not common as a default application. On the other hand, the rsync command is included in the default installation of nearly every POSIX system available, and it's widely recognized as one of the smartest ways to copy files both remotely and locally.

Better yet, it has a built-in progress bar.

alias cpv = 'rsync -ah --info=progress2'

Using this alias is the same as using the cp command:

$ cpv bigfile.flac /run/media/seth/audio/
3.83M 6% 213.15MB/s 0:00:00 (xfr#4, to-chk=0/4)

An interesting side effect of using this command is that rsync copies both files and directories without the -r flag that cp would otherwise require.

Protect yourself from file removal accidents

You shouldn't use the rm command. The rm manual even says so:

Warning : If you use 'rm' to remove a file, it is usually possible to recover the contents of that file. If you want more assurance that the contents are truly unrecoverable, consider using 'shred'.

If you want to remove a file, you should move the file to your Trash, just as you do when using a desktop.

POSIX makes this easy, because the Trash is an accessible, actual location in your filesystem. That location may change, depending on your platform: On a FreeDesktop , the Trash is located at ~/.local/share/Trash , while on MacOS it's ~/.Trash , but either way, it's just a directory into which you place files that you want out of sight until you're ready to erase them forever.

This simple alias provides a way to toss files into the Trash bin from your terminal:

alias tcn = 'mv --force -t ~/.local/share/Trash '

This alias uses a little-known mv flag that enables you to provide the file you want to move as the final argument, ignoring the usual requirement for that file to be listed first. Now you can use your new command to move files and folders to your system Trash:

$ ls
foo bar
$ tcn foo
$ ls
bar

Now the file is "gone," but only until you realize in a cold sweat that you still need it. At that point, you can rescue the file from your system Trash; be sure to tip the Bash and mv developers on the way out.

Note: If you need a more robust Trash command with better FreeDesktop compliance, see Trashy .

Simplify your Git workflow

Everyone has a unique workflow, but there are usually repetitive tasks no matter what. If you work with Git on a regular basis, then there's probably some sequence you find yourself repeating pretty frequently. Maybe you find yourself going back to the master branch and pulling the latest changes over and over again during the day, or maybe you find yourself creating tags and then pushing them to the remote, or maybe it's something else entirely.

No matter what Git incantation you've grown tired of typing, you may be able to alleviate some pain with a Bash alias. Largely thanks to its ability to pass arguments to hooks, Git has a rich set of introspective commands that save you from having to perform uncanny feats in Bash.

For instance, while you might struggle to locate, in Bash, a project's top-level directory (which, as far as Bash is concerned, is an entirely arbitrary designation, since the absolute top level to a computer is the root directory), Git knows its top level with a simple query. If you study up on Git hooks, you'll find yourself able to find out all kinds of information that Bash knows nothing about, but you can leverage that information with a Bash alias.

Here's an alias to find the top level of a Git project, no matter where in that project you are currently working, and then to change directory to it, change to the master branch, and perform a Git pull:

alias startgit = 'cd `git rev-parse --show-toplevel` && git checkout master && git pull'

This kind of alias is by no means a universally useful alias, but it demonstrates how a relatively simple alias can eliminate a lot of laborious navigation, commands, and waiting for prompts.

A simpler, and probably more universal, alias returns you to the Git project's top level. This alias is useful because when you're working on a project, that project more or less becomes your "temporary home" directory. It should be as simple to go "home" as it is to go to your actual home, and here's an alias to do it:

alias cg = 'cd `git rev-parse --show-toplevel`'

Now the command cg takes you to the top of your Git project, no matter how deep into its directory structure you have descended.

Change directories and view the contents at the same time

It was once (allegedly) proposed by a leading scientist that we could solve many of the planet's energy problems by harnessing the energy expended by geeks typing cd followed by ls .
It's a common pattern, because generally when you change directories, you have the impulse or the need to see what's around.

But "walking" your computer's directory tree doesn't have to be a start-and-stop process.

This one's cheating, because it's not an alias at all, but it's a great excuse to explore Bash functions. While aliases are great for quick substitutions, Bash allows you to add local functions in your .bashrc file (or a separate functions file that you load into .bashrc , just as you do your aliases file).

To keep things modular, create a new file called ~/.bash_functions and then have your .bashrc load it:

if [ -e $HOME / .bash_functions ] ; then
source $HOME / .bash_functions
fi

In the functions file, add this code:

function cl () {
DIR = "$*" ;
# if no DIR given, go home
if [ $# -lt 1 ] ; then
DIR = $HOME ;
fi ;
builtin cd " ${DIR} " && \
# use your preferred ls command
ls -F --color =auto
}

Load the function into your Bash session and then try it out:

$ source ~ / .bash_functions
$ cl Documents
foo bar baz
$ pwd
/ home / seth / Documents
$ cl ..
Desktop Documents Downloads
[ ... ]
$ pwd
/ home / seth

Functions are much more flexible than aliases, but with that flexibility comes the responsibility for you to ensure that your code makes sense and does what you expect. Aliases are meant to be simple, so keep them easy, but useful. For serious modifications to how Bash behaves, use functions or custom shell scripts saved to a location in your PATH .

For the record, there are some clever hacks to implement the cd and ls sequence as an alias, so if you're patient enough, then the sky is the limit even using humble aliases.

Start aliasing and functioning

Customizing your environment is what makes Linux fun, and increasing your efficiency is what makes Linux life-changing. Get started with simple aliases, graduate to functions, and post your must-have aliases in the comments!


ACG on 31 Jul 2019 Permalink

One function I like a lot is a function that diffs a file and its backup.
It goes something like

#!/usr/bin/env bash
file="${1:?File not given}"

if [[ ! -e "$file" || ! -e "$file"~ ]]; then
echo "File doesn't exist or has no backup" 1>&2
exit 1
fi

diff --color=always "$file"{~,} | less -r

I may have gotten the if wrong, but you get the idea. I'm typing this on my phone, away from home.
Cheers

Seth Kenlon on 31 Jul 2019 Permalink

That's pretty slick! I like it.

My backup tool of choice (rdiff-backup) handles these sorts of comparisons pretty well, so I tend to be confident in my backup files. That said, there's always the edge case, and this kind of function is a great solution for those. Thanks!

Kevin Cole on 13 Aug 2019 Permalink

A few of my "cannot-live-withouts" are regex based:

Decomment removes full-line comments and blank lines. For example, when looking at a "stock" /etc/httpd/whatever.conf file that has a gazillion lines in it,

alias decomment='egrep -v "^[[:space:]]*((#|;|//).*)?$" '

will show you that only four lines in the file actually DO anything, and the gazillion minus four are comments. I use this ALL the time with config files, Python (and other languages) code, and god knows where else.

Then there's unprintables and expletives which are both very similar:

alias unprintable='grep --color="auto" -P -n "[\x00-\x1E]"'
alias expletives='grep --color="auto" -P -n "[^\x00-\x7E]" '

The first shows which lines (with line numbers) in a file contain control characters, and the second shows which lines in a file contain anything "above" a RUBOUT, er, excuse me, I mean above ASCII 127. (I feel old.) ;-) Handy when, for example, someone gives you a program that they edited or created with LibreOffice, and oops... half of the quoted strings have "real" curly opening and closing quote marks instead of ASCII 0x22 "straight" quote mark delimiters... But there's actually a few curlies you want to keep, so a "nuke 'em all in one swell foop" approach won't work.

Seth Kenlon on 14 Aug 2019 Permalink

These are great!

Dan Jones on 13 Aug 2019 Permalink

Your `cl` function could be simplified, since `cd` without arguments already goes to home.

```
function cl() {
cd "$@" && \
ls -F --color=auto
}
```

Seth Kenlon on 14 Aug 2019 Permalink

Nice!

jkeener on 20 Aug 2019 Permalink

The first alias in my .bash_aliases file is always:

alias realias='vim ~/.bash_aliases; source ~/.bash_aliases'

replace vim with your favorite editor or $VISUAL

bhuvana on 04 Oct 2019 Permalink

Thanks for this post! I have created a Github repo- https://github.com/bhuvana-guna/awesome-bash-shortcuts
with a motive to create an extended list of aliases/functions for various programs. As I am a newbie to terminal and linux, please do contribute to it with these and other super awesome utilities and help others easily access them.

[Nov 08, 2019] A guide to logging in Python by Mario Corchero

Nov 08, 2019 | opensource.com

13 Sep 2017

Logging is a critical way to understand what's going on inside an application. 10 comments Image credits : Pixabay. CC0 x Subscribe now

Get the highlights in your inbox every week.

There is little worse as a developer than trying to figure out why an application is not working if you don't know what is going on inside it. Sometimes you can't even tell whether the system is working as designed at all.

When applications are running in production, they become black boxes that need to be traced and monitored. One of the simplest, yet most important ways to do so is by logging. Logging allows us -- at the time we develop our software -- to instruct the program to emit information while the system is running that will be useful for us and our sysadmins.

More Python Resources The same way we document code for future developers, we should direct new software to generate adequate logs for developers and sysadmins. Logs are a critical part of the system documentation about an application's runtime status. When instrumenting your software with logs, think of it like writing documentation for developers and sysadmins who will maintain the system in the future.

Some purists argue that a disciplined developer who uses logging and testing should hardly need an interactive debugger. If we cannot reason about our application during development with verbose logging, it will be even harder to do it when our code is running in production.

This article looks at Python's logging module, its design, and ways to adapt it for more complex use cases. This is not intended as documentation for developers, rather as a guide to show how the Python logging module is built and to encourage the curious to delve deeper.

Why use the logging module?

A developer might argue, why aren't simple print statements sufficient? The logging module offers multiple benefits, including:

This last point, the actual separation of the what we log from the how we log enables collaboration between different parts of the software. As an example, it allows the developer of a framework or library to add logs and let the sysadmin or person in charge of the runtime configuration decide what should be logged at a later point.

What's in the logging module

The logging module beautifully separates the responsibility of each of its parts (following the Apache Log4j API's approach). Let's look at how a log line travels around the module's code and explore its different parts.

Logger

Loggers are the objects a developer usually interacts with. They are the main APIs that indicate what we want to log.

Given an instance of a logger , we can categorize and ask for messages to be emitted without worrying about how or where they will be emitted.

For example, when we write logger.info("Stock was sold at %s", price) we have the following model in mind:

1-ikkitaj.jpg

We request a line and we assume some code is executed in the logger that makes that line appear in the console/file. But what actually is happening inside?

Log records

Log records are packages that the logging module uses to pass all required information around. They contain information about the function where the log was requested, the string that was passed, arguments, call stack information, etc.

These are the objects that are being logged. Every time we invoke our loggers, we are creating instances of these objects. But how do objects like these get serialized into a stream? Via handlers!

Handlers

Handlers emit the log records into any output. They take log records and handle them in the function of what they were built for.

As an example, a FileHandler will take a log record and append it to a file.

The standard logging module already comes with multiple built-in handlers like:

We have now a model that's closer to reality:

2-crmvrmf.jpg

But most handlers work with simple strings (SMTPHandler, FileHandler, etc.), so you may be wondering how those structured LogRecords are transformed into easy-to-serialize bytes...

Formatters

Let me present the Formatters. Formatters are in charge of serializing the metadata-rich LogRecord into a string. There is a default formatter if none is provided.

The generic formatter class provided by the logging library takes a template and style as input. Then placeholders can be declared for all the attributes in a LogRecord object.

As an example: '%(asctime)s %(levelname)s %(name)s: %(message)s' will generate logs like 2017-07-19 15:31:13,942 INFO parent.child: Hello EuroPython .

Note that the attribute message is the result of interpolating the log's original template with the arguments provided. (e.g., for logger.info("Hello %s", "Laszlo") , the message will be "Hello Laszlo").

All default attributes can be found in the logging documentation .

OK, now that we know about formatters, our model has changed again:

3-olzyr7p.jpg Filters

The last objects in our logging toolkit are filters.

Filters allow for finer-grained control of which logs should be emitted. Multiple filters can be attached to both loggers and handlers. For a log to be emitted, all filters should allow the record to pass.

Users can declare their own filters as objects using a filter method that takes a record as input and returns True / False as output.

With this in mind, here is the current logging workflow:

4-nfjk7uc.jpg The logger hierarchy

At this point, you might be impressed by the amount of complexity and configuration that the module is hiding so nicely for you, but there is even more to consider: the logger hierarchy.

We can create a logger via logging.getLogger(<logger_name>) . The string passed as an argument to getLogger can define a hierarchy by separating the elements using dots.

As an example, logging.getLogger("parent.child") will create a logger "child" with a parent logger named "parent." Loggers are global objects managed by the logging module, so they can be retrieved conveniently anywhere during our project.

Logger instances are also known as channels. The hierarchy allows the developer to define the channels and their hierarchy.

After the log record has been passed to all the handlers within the logger, the parents' handlers will be called recursively until we reach the top logger (defined as an empty string) or a logger has configured propagate = False . We can see it in the updated diagram:

5-sjkrpt3.jpg

Note that the parent logger is not called, only its handlers. This means that filters and other code in the logger class won't be executed on the parents. This is a common pitfall when adding filters to loggers.

Recapping the workflow

We've examined the split in responsibility and how we can fine tune log filtering. But there are two other attributes we haven't mentioned yet:

  1. Loggers can be disabled, thereby not allowing any record to be emitted from them.
  2. An effective level can be configured in both loggers and handlers.

As an example, when a logger has configured a level of INFO , only INFO levels and above will be passed. The same rule applies to handlers.

With all this in mind, the final flow diagram in the logging documentation looks like this:

6-kd6eiyr.jpg How to use logging

Now that we've looked at the logging module's parts and design, it's time to examine how a developer interacts with it. Here is a code example:

import logging

def sample_function ( secret_parameter ) :
logger = logging . getLogger ( __name__ ) # __name__=projectA.moduleB
logger. debug ( "Going to perform magic with '%s'" , secret_parameter )
...
try :
result = do_magic ( secret_parameter )
except IndexError :
logger. exception ( "OMG it happened again, someone please tell Laszlo" )
except :
logger. info ( "Unexpected exception" , exc_info = True )
raise
else :
logger. info ( "Magic with '%s' resulted in '%s'" , secret_parameter , result , stack_info = True )

This creates a logger using the module __name__ . It will create channels and hierarchies based on the project structure, as Python modules are concatenated with dots.

The logger variable references the logger "module," having "projectA" as a parent, which has "root" as its parent.

On line 5, we see how to perform calls to emit logs. We can use one of the methods debug , info , error , or critical to log using the appropriate level.

When logging a message, in addition to the template arguments, we can pass keyword arguments with specific meaning. The most interesting are exc_info and stack_info . These will add information about the current exception and the stack frame, respectively. For convenience, a method exception is available in the logger objects, which is the same as calling error with exc_info=True .

These are the basics of how to use the logger module. ʘ‿ʘ. But it is also worth mentioning some uses that are usually considered bad practices.

Greedy string formatting

Using loggger.info("string template {}".format(argument)) should be avoided whenever possible in favor of logger.info("string template %s", argument) . This is a better practice, as the actual string interpolation will be used only if the log will be emitted. Not doing so can lead to wasted cycles when we are logging on a level over INFO , as the interpolation will still occur.

Capturing and formatting exceptions

Quite often, we want to log information about the exception in a catch block, and it might feel intuitive to use:

try :
...
except Exception as error:
logger. info ( "Something bad happened: %s" , error )

But that code can give us log lines like Something bad happened: "secret_key." This is not that useful. If we use exc_info as illustrated previously, it will produce the following:

try :
...
except Exception :
logger. info ( "Something bad happened" , exc_info = True ) Something bad happened
Traceback ( most recent call last ) :
File "sample_project.py" , line 10 , in code
inner_code ()
File "sample_project.py" , line 6 , in inner_code
x = data [ "secret_key" ]
KeyError : 'secret_key'

This not only contains the exact source of the exception, but also the type.

Configuring our loggers

It's easy to instrument our software, and we need to configure the logging stack and specify how those records will be emitted.

There are multiple ways to configure the logging stack.

BasicConfig

This is by far the simplest way to configure logging. Just doing logging.basicConfig(level="INFO") sets up a basic StreamHandler that will log everything on the INFO and above levels to the console. There are arguments to customize this basic configuration. Some of them are:

Format Description Example
filename Specifies that a FileHandler should be created, using the specified filename, rather than a StreamHandler /var/logs/logs.txt
format Use the specified format string for the handler "'%(asctime)s %(message)s'"
datefmt Use the specified date/time format "%H:%M:%S"
level Set the root logger level to the specified level "INFO"

This is a simple and practical way to configure small scripts.

Note, basicConfig only works the first time it is called in a runtime. If you have already configured your root logger, calling basicConfig will have no effect.

DictConfig

The configuration for all elements and how to connect them can be specified as a dictionary. This dictionary should have different sections for loggers, handlers, formatters, and some basic global parameters.

Here's an example:

config = {
'disable_existing_loggers' : False ,
'version' : 1 ,
'formatters' : {
'short' : {
'format' : '%(asctime)s %(levelname)s %(name)s: %(message)s'
} ,
} ,
'handlers' : {
'console' : {
'level' : 'INFO' ,
'formatter' : 'short' ,
'class' : 'logging.StreamHandler' ,
} ,
} ,
'loggers' : {
'' : {
'handlers' : [ 'console' ] ,
'level' : 'ERROR' ,
} ,
'plugins' : {
'handlers' : [ 'console' ] ,
'level' : 'INFO' ,
'propagate' : False
}
} ,
}
import logging . config
logging . config . dictConfig ( config )

When invoked, dictConfig will disable all existing loggers, unless disable_existing_loggers is set to false . This is usually desired, as many modules declare a global logger that will be instantiated at import time, before dictConfig is called.

You can see the schema that can be used for the dictConfig method . Often, this configuration is stored in a YAML file and configured from there. Many developers often prefer this over using fileConfig , as it offers better support for customization.

Extending logging

Thanks to the way it is designed, it is easy to extend the logging module. Let's see some examples:

Logging JSON

If we want, we can log JSON by creating a custom formatter that transforms the log records into a JSON-encoded string:

import logging
import logging . config
import json
ATTR_TO_JSON = [ 'created' , 'filename' , 'funcName' , 'levelname' , 'lineno' , 'module' , 'msecs' , 'msg' , 'name' , 'pathname' , 'process' , 'processName' , 'relativeCreated' , 'thread' , 'threadName' ]
class JsonFormatter:
def format ( self , record ) :
obj = { attr: getattr ( record , attr )
for attr in ATTR_TO_JSON }
return json. dumps ( obj , indent = 4 )

handler = logging . StreamHandler ()
handler. formatter = JsonFormatter ()
logger = logging . getLogger ( __name__ )
logger. addHandler ( handler )
logger. error ( "Hello" )

Adding further context

On the formatters, we can specify any log record attribute.

We can inject attributes in multiple ways. In this example, we abuse filters to enrich the records.

import logging
import logging . config
GLOBAL_STUFF = 1

class ContextFilter ( logging . Filter ) :
def filter ( self , record ) :
global GLOBAL_STUFF
GLOBAL_STUFF + = 1
record. global_data = GLOBAL_STUFF
return True

handler = logging . StreamHandler ()
handler. formatter = logging . Formatter ( "%(global_data)s %(message)s" )
handler. addFilter ( ContextFilter ())
logger = logging . getLogger ( __name__ )
logger. addHandler ( handler )

logger. error ( "Hi1" )
logger. error ( "Hi2" )

This effectively adds an attribute to all the records that go through that logger. The formatter will then include it in the log line.

Note that this impacts all log records in your application, including libraries or other frameworks that you might be using and for which you are emitting logs. It can be used to log things like a unique request ID on all log lines to track requests or to add extra contextual information.

Starting with Python 3.2, you can use setLogRecordFactory to capture all log record creation and inject extra information. The extra attribute and the LoggerAdapter class may also be of the interest.

Buffering logs

Sometimes we would like to have access to debug logs when an error happens. This is feasible by creating a buffered handler that will log the last debug messages after an error occurs. See the following code as a non-curated example:

import logging
import logging . handlers

class SmartBufferHandler ( logging . handlers . MemoryHandler ) :
def __init__ ( self , num_buffered , *args , **kwargs ) :
kwargs [ "capacity" ] = num_buffered + 2 # +2 one for current, one for prepop
super () . __init__ ( *args , **kwargs )

def emit ( self , record ) :
if len ( self . buffer ) == self . capacity - 1 :
self . buffer . pop ( 0 )
super () . emit ( record )

handler = SmartBufferHandler ( num_buffered = 2 , target = logging . StreamHandler () , flushLevel = logging . ERROR )
logger = logging . getLogger ( __name__ )
logger. setLevel ( "DEBUG" )
logger. addHandler ( handler )

logger. error ( "Hello1" )
logger. debug ( "Hello2" ) # This line won't be logged
logger. debug ( "Hello3" )
logger. debug ( "Hello4" )
logger. error ( "Hello5" ) # As error will flush the buffered logs, the two last debugs will be logged

For more information

This introduction to the logging library's flexibility and configurability aims to demonstrate the beauty of how its design splits concerns. It also offers a solid foundation for anyone interested in a deeper dive into the logging documentation and the how-to guide . Although this article isn't a comprehensive guide to Python logging, here are answers to a few frequently asked questions.

My library emits a "no logger configured" warning

Check how to configure logging in a library from "The Hitchhiker's Guide to Python."

What happens if a logger has no level configured?

The effective level of the logger will then be defined recursively by its parents.

All my logs are in local time. How do I log in UTC?

Formatters are the answer! You need to set the converter attribute of your formatter to generate UTC times. Use converter = time.gmtime .

[Nov 08, 2019] The Python community is special

Oct 29, 2019 | opensource.com

Moshe Zadka (Community Moderator) Feed 26 up 2 comments The Python community is amazing. It was one of the first to adopt a code of conduct, first for the Python Software Foundation and then for PyCon . There is a real commitment to diversity and inclusion: blog posts and conference talks on this theme are frequent, thoughtful, and well-read by Python community members.

While the community is global, there is a lot of great activity in the local community as well. Local Python meet-ups are a great place to meet wonderful people who are smart, experienced, and eager to help. A lot of meet-ups will explicitly have time set aside for experienced people to help newcomers who want to learn a new concept or to get past an issue with their code. My local community took the time to support me as I began my Python journey, and I am privileged to continue to give back to new developers.

Whether you can attend a local community meet-up or you spend time with the online Python community across IRC, Slack, and Twitter, I am sure you will meet lovely people who want to help you succeed as a developer.

[Nov 08, 2019] Selected Python-related articles from opensource.org

There are lots of ways to save time as a sysadmin. Here's one way to build a web app using open source tools to automate a chunk of your daily routine away
Nov 08, 2019 | opensource.com
  1. A dozen ways to learn Python These resources will get you started and well on your way to proficiency with Python. Don Watkins (Community Moderator) 27 Aug 2019 51
  2. Building a non-breaking breakpoint for Python debugging Have you ever wondered how to speed up a debugger? Here are some lessons learned while building one for Python. Liran Haimovitch 13 Aug 2019 39
  3. Sending custom emails with Python Customize your group emails with Mailmerge, a command-line program that can handle simple and complex emails. Brian "bex" Exelbierd (Red Hat) 08 Aug 2019 42
  4. Save and load Python data with JSON The JSON format saves you from creating your own data formats, and is particularly easy to learn if you already know Python. Here's how to use it with Python. Seth Kenlon (Red Hat) 16 Jul 2019 36
  5. Parse arguments with Python Parse Python like a pro with the argparse module. Seth Kenlon (Red Hat) 03 Jul 2019 59
  6. Learn object-oriented programming with Python Make your code more modular with Python classes. Seth Kenlon (Red Hat) 05 Jul 2019 62
  7. Get modular with Python functions Minimize your coding workload by using Python functions for repeating tasks. Seth Kenlon (Red Hat) 01 Jul 2019 62
  8. Learn Python with these awesome resources Expand your Python knowledge by adding these resources to your personal learning network. Don Watkins (Community Moderator) 31 May 2019 71
  9. Run your blog on GitHub Pages with Python Create a blog with Pelican, a Python-based blogging platform that works well with GitHub. Erik O'Shaughnessy 23 May 2019 76
  10. 100 ways to learn Python and R for data science Want to learn data science? Find recommended courses in the Data Science Repo, a community-sourced directory of Python and R learning resources. Chris Engelhardt Dorris Scott Annu Singh 15 May 2019 59
  11. How to analyze log data with Python and Apache Spark Case study with NASA logs to show how Spark can be leveraged for analyzing data at scale. Dipanjan (DJ) Sarkar (Red Hat) 14 May 2019 76
  12. Learn Python by teaching in your community A free and fun way to learn by teaching others. Don Watkins (Community Moderator) 13 May 2019 47
  13. Format Python however you like with Black Learn more about solving common Python problems in our series covering seven PyPI libraries. Moshe Zadka (Community Moderator) 02 May 2019 80
  14. Write faster C extensions for Python with Cython Learn more about solving common Python problems in our series covering seven PyPI libraries. Moshe Zadka (Community Moderator) 01 May 2019 64
  15. Managing Python packages the right way Don't fall victim to the perils of Python package management. László Kiss Kollár 04 Apr 2019 75
  16. 7 steps for hunting down Python code bugs Learn some tricks to minimize the time you spend tracking down the reasons your code fails. Maria Mckinley 08 Feb 2019 108
  17. Getting started with Pelican: A Python-based static site generator Pelican is a great choice for Python users who want to self-host a simple website or blog. Craig Sebenik 07 Jan 2019 119
  18. Python 3.7 beginner's cheat sheet Get acquainted with Python's built-in pieces. Nicholas Hunt-Walker 20 Sep 2018 177
  19. 18 Python programming books for beginners and veterans Get started with this popular language or buff up on your coding skills with this curated book list. Jen Wike Huger (Red Hat) 10 Sep 2018 157
  20. Why I love Xonsh Ever wondered if Python can be your shell? Moshe Zadka (Community Moderator) 04 Sep 2018 123

[Nov 08, 2019] Pylint: Making your Python code consistent

Oct 21, 2019 | opensource.com

Pylint is your friend when you want to avoid arguing about code complexity.

Moshe Zadka (Community Moderator) Feed 30 up 1 comment

Get the highlights in your inbox every week.

flake8 and black will take care of "local" style: where the newlines occur, how comments are formatted, or find issues like commented out code or bad practices in log formatting.

Pylint is extremely aggressive by default. It will offer strong opinions on everything from checking if declared interfaces are actually implemented to opportunities to refactor duplicate code, which can be a lot to a new user. One way of introducing it gently to a project, or a team, is to start by turning all checkers off, and then enabling checkers one by one. This is especially useful if you already use flake8, black, and mypy : Pylint has quite a few checkers that overlap in functionality.

More Python Resources

However, one of the things unique to Pylint is the ability to enforce higher-level issues: for example, number of lines in a function, or number of methods in a class.

These numbers might be different from project to project and can depend on the development team's preferences. However, once the team comes to an agreement about the parameters, it is useful to enforce those parameters using an automated tool. This is where Pylint shines.

Configuring Pylint

In order to start with an empty configuration, start your .pylintrc with

[MESSAGES CONTROL]

disable=all

This disables all Pylint messages. Since many of them are redundant, this makes sense. In Pylint, a message is a specific kind of warning.

You can check that all messages have been turned off by running pylint :

$ pylint <my package>

In general, it is not a great idea to add parameters to the pylint command-line: the best place to configure your pylint is the .pylintrc . In order to have it do something useful, we need to enable some messages.

In order to enable messages, add to your .pylintrc , under the [MESSAGES CONTROL] .

enable = <message>,

...

For the "messages" (what Pylint calls different kinds of warnings) that look useful. Some of my favorites include too-many-lines , too-many-arguments , and too-many-branches . All of those limit complexity of modules or functions, and serve as an objective check, without a human nitpicker needed, for code complexity measurement.

A checker is a source of messages : every message belongs to exactly one checker. Many of the most useful messages are under the design checker . The default numbers are usually good, but tweaking the maximums is straightfoward: we can add a section called DESIGN in the .pylintrc .

[ DESIGN ]

max-args=7

max-locals=15

Another good source of useful messages is the refactoring checker. Some of my favorite messages to enable there are consider-using-dict-comprehension , stop-iteration-return (which looks for generators which use raise StopIteration when return is the correct way to stop the iteration). and chained-comparison , which will suggest using syntax like 1 <= x < 5 rather than the less obvious 1 <= x && x > 5

Finally, an expensive checker, in terms of performance, but highly useful, is similarities . It is designed to enforce "Don't Repeat Yourself" (the DRY principle) by explicitly looking for copy-paste between different parts of the code. It only has one message to enable: duplicate-code . The default "minimum similarity lines" is set to 4 . It is possible to set it to a different value using the .pylintrc .

[ SIMILARITIES ]

min-similarity-lines=3

Pylint makes code reviews easy

If you are sick of code reviews where you point out that a class is too complicated, or that two different functions are basically the same, add Pylint to your Continuous Integration configuration, and only have the arguments about complexity guidelines for your project once .

[Nov 08, 2019] Manage NTP with Chrony by David Both

Dec 03, 2018 | opensource.com

Chronyd is a better choice for most networks than ntpd for keeping computers synchronized with the Network Time Protocol.

"Does anybody really know what time it is? Does anybody really care?"
Chicago , 1969

Perhaps that rock group didn't care what time it was, but our computers do need to know the exact time. Timekeeping is very important to computer networks. In banking, stock markets, and other financial businesses, transactions must be maintained in the proper order, and exact time sequences are critical for that. For sysadmins and DevOps professionals, it's easier to follow the trail of email through a series of servers or to determine the exact sequence of events using log files on geographically dispersed hosts when exact times are kept on the computers in question.

I used to work at an organization that received over 20 million emails per day and had four servers just to accept and do a basic filter on the incoming flood of email. From there, emails were sent to one of four other servers to perform more complex anti-spam assessments, then they were delivered to one of several additional servers where the emails were placed in the correct inboxes. At each layer, the emails would be sent to one of the next-level servers, selected only by the randomness of round-robin DNS. Sometimes we had to trace a new message through the system until we could determine where it "got lost," according to the pointy-haired bosses. We had to do this with frightening regularity.

Most of that email turned out to be spam. Some people actually complained that their [joke, cat pic, recipe, inspirational saying, or other-strange-email]-of-the-day was missing and asked us to find it. We did reject those opportunities.

Our email and other transactional searches were aided by log entries with timestamps that -- today -- can resolve down to the nanosecond in even the slowest of modern Linux computers. In very high-volume transaction environments, even a few microseconds of difference in the system clocks can mean sorting thousands of transactions to find the correct one(s).

The NTP server hierarchy

Computers worldwide use the Network Time Protocol (NTP) to synchronize their times with internet standard reference clocks via a hierarchy of NTP servers. The primary servers are at stratum 1, and they are connected directly to various national time services at stratum 0 via satellite, radio, or even modems over phone lines. The time service at stratum 0 may be an atomic clock, a radio receiver tuned to the signals broadcast by an atomic clock, or a GPS receiver using the highly accurate clock signals broadcast by GPS satellites.

To prevent time requests from time servers lower in the hierarchy (i.e., with a higher stratum number) from overwhelming the primary reference servers, there are several thousand public NTP stratum 2 servers that are open and available for anyone to use. Many organizations with large numbers of hosts that need an NTP server will set up their own time servers so that only one local host accesses the stratum 2 time servers, then they configure the remaining network hosts to use the local time server which, in my case, is a stratum 3 server.

NTP choices

The original NTP daemon, ntpd , has been joined by a newer one, chronyd . Both keep the local host's time synchronized with the time server. Both services are available, and I have seen nothing to indicate that this will change anytime soon.

Chrony has features that make it the better choice for most environments for the following reasons:

The NTP and Chrony RPM packages are available from standard Fedora repositories. You can install both and switch between them, but modern Fedora, CentOS, and RHEL releases have moved from NTP to Chrony as their default time-keeping implementation. I have found that Chrony works well, provides a better interface for the sysadmin, presents much more information, and increases control.

Just to make it clear, NTP is a protocol that is implemented with either NTP or Chrony. If you'd like to know more, read this comparison between NTP and Chrony as implementations of the NTP protocol.

This article explains how to configure Chrony clients and servers on a Fedora host, but the configuration for CentOS and RHEL current releases works the same.

Chrony structure

The Chrony daemon, chronyd , runs in the background and monitors the time and status of the time server specified in the chrony.conf file. If the local time needs to be adjusted, chronyd does it smoothly without the programmatic trauma that would occur if the clock were instantly reset to a new time.

Chrony's chronyc tool allows someone to monitor the current status of Chrony and make changes if necessary. The chronyc utility can be used as a command that accepts subcommands, or it can be used as an interactive text-mode program. This article will explain both uses.

Client configuration

The NTP client configuration is simple and requires little or no intervention. The NTP server can be defined during the Linux installation or provided by the DHCP server at boot time. The default /etc/chrony.conf file (shown below in its entirety) requires no intervention to work properly as a client. For Fedora, Chrony uses the Fedora NTP pool, and CentOS and RHEL have their own NTP server pools. Like many Red Hat-based distributions, the configuration file is well commented.

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
pool 2.fedora.pool.ntp.org iburst

# Record the rate at which the system clock gains/losses time.
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 second.
makestep 1.0 3

# Enable kernel synchronization of the real-time clock (RTC).

# Enable hardware timestamping on all interfaces that support it.
#hwtimestamp *

# Increase the minimum number of selectable sources required to adjust
# the system clock.
#minsources 2

# Allow NTP client access from local network.
#allow 192.168.0.0/16

# Serve time even if not synchronized to a time source.
#local stratum 10

# Specify file containing keys for NTP authentication.
keyfile /etc/chrony.keys

# Get TAI-UTC offset and leap seconds from the system tz database.
leapsectz right/UTC

# Specify directory for log files.
logdir /var/log/chrony

# Select which information is logged.
#log measurements statistics tracking

Let's look at the current status of NTP on a virtual machine I use for testing. The chronyc command, when used with the tracking subcommand, provides statistics that report how far off the local system is from the reference server.

[root@studentvm1 ~]# chronyc tracking
Reference ID : 23ABED4D (ec2-35-171-237-77.compute-1.amazonaws.com)
Stratum : 3
Ref time (UTC) : Fri Nov 16 16:21:30 2018
System time : 0.000645622 seconds slow of NTP time
Last offset : -0.000308577 seconds
RMS offset : 0.000786140 seconds
Frequency : 0.147 ppm slow
Residual freq : -0.073 ppm
Skew : 0.062 ppm
Root delay : 0.041452706 seconds
Root dispersion : 0.022665167 seconds
Update interval : 1044.2 seconds
Leap status : Normal
[root@studentvm1 ~]#

The Reference ID in the first line of the result is the server the host is synchronized to -- in this case, a stratum 3 reference server that was last contacted by the host at 16:21:30 2018. The other lines are described in the chronyc(1) man page .

The sources subcommand is also useful because it provides information about the time source configured in chrony.conf .

[root@studentvm1 ~]# chronyc sources
210 Number of sources = 5
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^+ 192.168.0.51 3 6 377 0 -2613us[-2613us] +/- 63ms
^+ dev.smatwebdesign.com 3 10 377 28m -2961us[-3534us] +/- 113ms
^+ propjet.latt.net 2 10 377 465 -1097us[-1085us] +/- 77ms
^* ec2-35-171-237-77.comput> 2 10 377 83 +2388us[+2395us] +/- 95ms
^+ PBX.cytranet.net 3 10 377 507 -1602us[-1589us] +/- 96ms
[root@studentvm1 ~]#

The first source in the list is the time server I set up for my personal network. The others were provided by the pool. Even though my NTP server doesn't appear in the Chrony configuration file above, my DHCP server provides its IP address for the NTP server. The "S" column -- Source State -- indicates with an asterisk ( * ) the server our host is synced to. This is consistent with the data from the tracking subcommand.

The -v option provides a nice description of the fields in this output.

[root@studentvm1 ~]# chronyc sources -v
210 Number of sources = 5

.-- Source mode '^' = server, '=' = peer, '#' = local clock.
/ .- Source state '*' = current synced, '+' = combined , '-' = not combined,
| / '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
|| .- xxxx [ yyyy ] +/- zzzz
|| Reachability register (octal) -. | xxxx = adjusted offset,
|| Log2(Polling interval) --. | | yyyy = measured offset,
|| \ | | zzzz = estimated error.
|| | | \
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^+ 192.168.0.51 3 7 377 28 -2156us[-2156us] +/- 63ms
^+ triton.ellipse.net 2 10 377 24 +5716us[+5716us] +/- 62ms
^+ lithium.constant.com 2 10 377 351 -820us[ -820us] +/- 64ms
^* t2.time.bf1.yahoo.com 2 10 377 453 -992us[ -965us] +/- 46ms
^- ntp.idealab.com 2 10 377 799 +3653us[+3674us] +/- 87ms
[root@studentvm1 ~]#

If I wanted my server to be the preferred reference time source for this host, I would add the line below to the /etc/chrony.conf file.

server 192.168.0.51 iburst prefer

I usually place this line just above the first pool server statement near the top of the file. There is no special reason for this, except I like to keep the server statements together. It would work just as well at the bottom of the file, and I have done that on several hosts. This configuration file is not sequence-sensitive.

The prefer option marks this as the preferred reference source. As such, this host will always be synchronized with this reference source (as long as it is available). We can also use the fully qualified hostname for a remote reference server or the hostname only (without the domain name) for a local reference time source as long as the search statement is set in the /etc/resolv.conf file. I prefer the IP address to ensure that the time source is accessible even if DNS is not working. In most environments, the server name is probably the better option, because NTP will continue to work even if the server's IP address changes.

If you don't have a specific reference source you want to synchronize to, it is fine to use the defaults.

Configuring an NTP server with Chrony

The nice thing about the Chrony configuration file is that this single file configures the host as both a client and a server. To add a server function to our host -- it will always be a client, obtaining its time from a reference server -- we just need to make a couple of changes to the Chrony configuration, then configure the host's firewall to accept NTP requests.

Open the /etc/chrony.conf file in your favorite text editor and uncomment the local stratum 10 line. This enables the Chrony NTP server to continue to act as if it were connected to a remote reference server if the internet connection fails; this enables the host to continue to be an NTP server to other hosts on the local network.

Let's restart chronyd and track how the service is working for a few minutes. Before we enable our host as an NTP server, we want to test a bit.

[root@studentvm1 ~]# systemctl restart chronyd ; watch chronyc tracking

The results should look like this. The watch command runs the chronyc tracking command every two seconds so we can watch changes occur over time.

Every 2.0s: chronyc tracking studentvm1: Fri Nov 16 20:59:31 2018

Reference ID : C0A80033 (192.168.0.51)
Stratum : 4
Ref time (UTC) : Sat Nov 17 01:58:51 2018
System time : 0.001598277 seconds fast of NTP time
Last offset : +0.001791533 seconds
RMS offset : 0.001791533 seconds
Frequency : 0.546 ppm slow
Residual freq : -0.175 ppm
Skew : 0.168 ppm
Root delay : 0.094823152 seconds
Root dispersion : 0.021242738 seconds
Update interval : 65.0 seconds
Leap status : Normal

Notice that my NTP server, the studentvm1 host, synchronizes to the host at 192.168.0.51, which is my internal network NTP server, at stratum 4. Synchronizing directly to the Fedora pool machines would result in synchronization at stratum 3. Notice also that the amount of error decreases over time. Eventually, it should stabilize with a tiny variation around a fairly small range of error. The size of the error depends upon the stratum and other network factors. After a few minutes, use Ctrl+C to break out of the watch loop.

To turn our host into an NTP server, we need to allow it to listen on the local network. Uncomment the following line to allow hosts on the local network to access our NTP server.

# Allow NTP client access from local network.
allow 192.168.0.0/16

Note that the server can listen for requests on any local network it's attached to. The IP address in the "allow" line is just intended for illustrative purposes. Be sure to change the IP network and subnet mask in that line to match your local network's.

Restart chronyd .

[root@studentvm1 ~]# systemctl restart chronyd

To allow other hosts on your network to access this server, configure the firewall to allow inbound UDP packets on port 123. Check your firewall's documentation to find out how to do that.

Testing

Your host is now an NTP server. You can test it with another host or a VM that has access to the network on which the NTP server is listening. Configure the client to use the new NTP server as the preferred server in the /etc/chrony.conf file, then monitor that client using the chronyc tools we used above.

Chronyc as an interactive tool

As I mentioned earlier, chronyc can be used as an interactive command tool. Simply run the command without a subcommand and you get a chronyc command prompt.

[root@studentvm1 ~]# chronyc
chrony version 3.4
Copyright (C) 1997-2003, 2007, 2009-2018 Richard P. Curnow and others
chrony comes with ABSOLUTELY NO WARRANTY. This is free software, and
you are welcome to redistribute it under certain conditions. See the
GNU General Public License version 2 for details.

chronyc>

You can enter just the subcommands at this prompt. Try using the tracking , ntpdata , and sources commands. The chronyc command line allows command recall and editing for chronyc subcommands. You can use the help subcommand to get a list of possible commands and their syntax.

Conclusion

Chrony is a powerful tool for synchronizing the times of client hosts, whether they are all on the local network or scattered around the globe. It's easy to configure because, despite the large number of options available, only a few configurations are required for most circumstances.

After my client computers have synchronized with the NTP server, I like to set the system hardware clock from the system (OS) time by using the following command:

/sbin/hwclock --systohc

This command can be added as a cron job or a script in cron.daily to keep the hardware clock synced with the system time.

Chrony and NTP (the service) both use the same configuration, and the files' contents are interchangeable. The man pages for chronyd , chronyc , and chrony.conf contain an amazing amount of information that can help you get started or learn about esoteric configuration options.

Do you run your own NTP server? Let us know in the comments and be sure to tell us which implementation you are using, NTP or Chrony.

[Nov 08, 2019] Quiet log noise with Python and machine learning by Tristan de Cacqueray

Sep 28, 2018 | opensource.com

Logreduce machine learning model is trained using previous successful job runs to extract anomalies from failed runs' logs.

This principle can also be applied to other use cases, for example, extracting anomalies from Journald or other systemwide regular log files.

Using machine learning to reduce noise

A typical log file contains many nominal events ("baselines") along with a few exceptions that are relevant to the developer. Baselines may contain random elements such as timestamps or unique identifiers that are difficult to detect and remove. To remove the baseline events, we can use a k -nearest neighbors pattern recognition algorithm ( k -NN).

ml-generic-workflow.png

Log events must be converted to numeric values for k -NN regression. Using the generic feature extraction tool HashingVectorizer enables the process to be applied to any type of log. It hashes each word and encodes each event in a sparse matrix. To further reduce the search space, tokenization removes known random words, such as dates or IP addresses.

hashing-vectorizer.png

Once the model is trained, the k -NN search tells us the distance of each new event from the baseline.

kneighbors.png

This Jupyter notebook demonstrates the process and graphs the sparse matrix vectors.

anomaly-detection-with-scikit-learn.png

Introducing Logreduce

The Logreduce Python software transparently implements this process. Logreduce's initial goal was to assist with Zuul CI job failure analyses using the build database, and it is now integrated into the Software Factory development forge's job logs process.

At its simplest, Logreduce compares files or directories and removes lines that are similar. Logreduce builds a model for each source file and outputs any of the target's lines whose distances are above a defined threshold by using the following syntax: distance | filename:line-number: line-content .

$ logreduce diff / var / log / audit / audit.log.1 / var / log / audit / audit.log
INFO logreduce.Classifier - Training took 21.982s at 0.364MB / s ( 1.314kl / s ) ( 8.000 MB - 28.884 kilo-lines )
0.244 | audit.log: 19963 : type =USER_AUTH acct = "root" exe = "/usr/bin/su" hostname =managesf.sftests.com
INFO logreduce.Classifier - Testing took 18.297s at 0.306MB / s ( 1.094kl / s ) ( 5.607 MB - 20.015 kilo-lines )
99.99 % reduction ( from 20015 lines to 1

A more advanced Logreduce use can train a model offline to be reused. Many variants of the baselines can be used to fit the k -NN search tree.

$ logreduce dir-train audit.clf / var / log / audit / audit.log. *
INFO logreduce.Classifier - Training took 80.883s at 0.396MB / s ( 1.397kl / s ) ( 32.001 MB - 112.977 kilo-lines )
DEBUG logreduce.Classifier - audit.clf: written
$ logreduce dir-run audit.clf / var / log / audit / audit.log

Logreduce also implements interfaces to discover baselines for Journald time ranges (days/weeks/months) and Zuul CI job build histories. It can also generate HTML reports that group anomalies found in multiple files in a simple interface.

html-report.png Managing baselines

The key to using k -NN regression for anomaly detection is to have a database of known good baselines, which the model uses to detect lines that deviate too far. This method relies on the baselines containing all nominal events, as anything that isn't found in the baseline will be reported as anomalous.

CI jobs are great targets for k -NN regression because the job outputs are often deterministic and previous runs can be automatically used as baselines. Logreduce features Zuul job roles that can be used as part of a failed job post task in order to issue a concise report (instead of the full job's logs). This principle can be applied to other cases, as long as baselines can be constructed in advance. For example, a nominal system's SoS report can be used to find issues in a defective deployment.

baselines.png Anomaly classification service

The next version of Logreduce introduces a server mode to offload log processing to an external service where reports can be further analyzed. It also supports importing existing reports and requests to analyze a Zuul build. The services run analyses asynchronously and feature a web interface to adjust scores and remove false positives.

classification-interface.png

Reviewed reports can be archived as a standalone dataset with the target log files and the scores for anomalous lines recorded in a flat JSON file.

Project roadmap

Logreduce is already being used effectively, but there are many opportunities for improving the tool. Plans for the future include:

If you are interested in getting involved in this project, please contact us on the #log-classify Freenode IRC channel. Feedback is always appreciated!


Tristan Cacqueray will present Reduce your log noise using machine learning at the OpenStack Summit , November 13-15 in Berlin.

[Nov 08, 2019] Getting started with Logstash by Jamie Riedesel

Nov 08, 2019 | opensource.com

No longer a simple log-processing pipeline, Logstash has evolved into a powerful and versatile data processing tool. Here are basics to get you started. 19 Oct 2017 Feed 298 up Image by : Opensource.com x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0 Logstash , an open source tool released by Elastic , is designed to ingest and transform data. It was originally built to be a log-processing pipeline to ingest logging data into ElasticSearch . Several versions later, it can do much more.

At its core, Logstash is a form of Extract-Transform-Load (ETL) pipeline. Unstructured log data is extracted , filters transform it, and the results are loaded into some form of data store.

Logstash can take a line of text like this syslog example:

Sep 11 14:13:38 vorthys sshd[16998]: Received disconnect from 192.0.2.11 port 53730:11: disconnected by user

and transform it into a much richer datastructure:

{
"timestamp" : "1505157218000" ,
"host" : "vorthys" ,
"program" : "sshd" ,
"pid" : "16998" ,
"message" : "Received disconnect from 192.0.2.11 port 53730:11: disconnected by user" ,
"sshd_action" : "disconnect" ,
"sshd_tuple" : "192.0.2.11:513730"
}

Depending on what you are using for your backing store, you can find events like this by using indexed fields rather than grepping terabytes of text. If you're generating tens to hundreds of gigabytes of logs a day, that matters.

Internal architecture

Logstash has a three-stage pipeline implemented in JRuby:

The input stage plugins extract data. This can be from logfiles, a TCP or UDP listener, one of several protocol-specific plugins such as syslog or IRC, or even queuing systems such as Redis, AQMP, or Kafka. This stage tags incoming events with metadata surrounding where the events came from.

The filter stage plugins transform and enrich the data. This is the stage that produces the sshd_action and sshd_tuple fields in the example above. This is where you'll find most of Logstash's value.

The output stage plugins load the processed events into something else, such as ElasticSearch or another document-database, or a queuing system such as Redis, AQMP, or Kafka. It can also be configured to communicate with an API. It is also possible to hook up something like PagerDuty to your Logstash outputs.

Have a cron job that checks if your backups completed successfully? It can issue an alarm in the logging stream. This is picked up by an input, and a filter config set up to catch those events marks it up, allowing a conditional output to know this event is for it. This is how you can add alarms to scripts that would otherwise need to create their own notification layers, or that operate on systems that aren't allowed to communicate with the outside world.

Threads

In general, each input runs in its own thread. The filter and output stages are more complicated. In Logstash 1.5 through 2.1, the filter stage had a configurable number of threads, with the output stage occupying a single thread. That changed in Logstash 2.2, when the filter-stage threads were built to handle the output stage. With one fewer internal queue to keep track of, throughput improved with Logstash 2.2.

If you're running an older version, it's worth upgrading to at least 2.2. When we moved from 1.5 to 2.2, we saw a 20-25% increase in overall throughput. Logstash also spent less time in wait states, so we used more of the CPU (47% vs 75%).

Configuring the pipeline

Logstash can take a single file or a directory for its configuration. If a directory is given, it reads the files in lexical order. This is important, as ordering is significant for filter plugins (we'll discuss that in more detail later).

Here is a bare Logstash config file:

input { }
filter { }
output { }

Each of these will contain zero or more plugin configurations, and there can be multiple blocks.

Input config

An input section can look like this:

input {
syslog {
port => 514
type => "syslog_server"
}
}

This tells Logstash to open the syslog { } plugin on port 514 and will set the document type for each event coming in through that plugin to be syslog_server . This plugin follows RFC 3164 only, not the newer RFC 5424.

Here is a slightly more complex input block:

# Pull in syslog data
input {
file {
path => [
"/var/log/syslog" ,
"/var/log/auth.log"
]
type => "syslog"
}
}

# Pull in application - log data. They emit data in JSON form.
input {
file {
path => [
"/var/log/app/worker_info.log" ,
"/var/log/app/broker_info.log" ,
"/var/log/app/supervisor.log"
]
exclude => "*.gz"
type => "applog"
codec => "json"
}
}

This one uses two different input { } blocks to call different invocations of the file { } plugin : One tracks system-level logs, the other tracks application-level logs. By using two different input { } blocks, a Java thread is spawned for each one. For a multi-core system, different cores keep track of the configured files; if one thread blocks, the other will continue to function.

Both of these file { } blocks could be put into the same input { } block; they would simply run in the same thread -- Logstash doesn't really care.

Filter config

The filter section is where you transform your data into something that's newer and easier to work with. Filters can get quite complex. Here are a few examples of filters that accomplish different goals:

filter {
if [ program ] == "metrics_fetcher" {
mutate {
add_tag => [ 'metrics' ]
}
}
}

In this example, if the program field, populated by the syslog plugin in the example input at the top, reads metrics_fetcher , then it tags the event metrics . This tag could be used in a later filter plugin to further enrich the data.

filter {
if "metrics" in [ tags ] {
kv {
source => "message"
target => "metrics"
}
}
}

This one runs only if metrics is in the list of tags. It then uses the kv { } plugin to populate a new set of fields based on the key=value pairs in the message field. These new keys are placed as sub-fields of the metrics field, allowing the text pages_per_second=42 faults=0 to become metrics.pages_per_second = 42 and metrics.faults = 0 on the event.

Why wouldn't you just put this in the same conditional that set the tag value? Because there are multiple ways an event could get the metrics tag -- this way, the kv filter will handle them all.

Because the filters are ordered, being sure that the filter plugin that defines the metrics tag is run before the conditional that checks for it is important. Here are guidelines to ensure your filter sections are optimally ordered:

  1. Your early filters should apply as much metadata as possible.
  2. Using the metadata, perform detailed parsing of events.
  3. In your late filters, regularize your data to reduce problems downstream.
    • Ensure field data types get cast to a unified value. priority could be boolean, integer, or string.
      • Some systems, including ElasticSearch, will quietly convert types for you. Sending strings into a boolean field won't give you the results you want.
      • Other systems will reject a value outright if it isn't in the right data type.
    • The mutate { } plugin is helpful here, as it has methods to coerce fields into specific data types.

Here are useful plugins to extract fields from long strings:

Output config

Elastic would like you to send it all into ElasticSearch, but anything that can accept a JSON document, or the datastructure it represents, can be an output. Keep in mind that events can be sent to multiple outputs. Consider this example of metrics:

output {
# Send to the local ElasticSearch port , and rotate the index daily.
elasticsearch {
hosts => [
"localhost" ,
"logelastic.prod.internal"
]
template_name => "logstash"
index => "logstash-{+YYYY.MM.dd}"
}

if "metrics" in [ tags ] {
influxdb {
host => "influx.prod.internal"
db => "logstash"
measurement => "appstats"
# This next bit only works because it is already a hash.
data_points => "%{metrics}"
send_as_tags => [ 'environment' , 'application' ]
}
}
}

Remember the metrics example above? This is how we can output it. The events tagged metrics will get sent to ElasticSearch in their full event form. In addition, the subfields under the metrics field on that event will be sent to influxdb , in the logstash database, under the appstats measurement. Along with the measurements, the values of the environment and application fields will be submitted as indexed tags.

There are a great many outputs. Here are some grouped by type:

There are many more output plugins .

[Nov 08, 2019] 10 resources every sysadmin should know about Opensource.com

Nov 08, 2019 | opensource.com

Cheat

Having a hard time remembering a command? Normally you might resort to a man page, but some man pages have a hard time getting to the point. It's the reason Chris Allen Lane came up with the idea (and more importantly, the code) for a cheat command .

The cheat command displays cheatsheets for common tasks in your terminal. It's a man page without the preamble. It cuts to the chase and tells you exactly how to do whatever it is you're trying to do. And if it lacks a common example that you think ought to be included, you can submit an update.

$ cheat tar
# To extract an uncompressed archive:
tar -xvf '/path/to/foo.tar'

# To extract a .gz archive:
tar -xzvf '/path/to/foo.tgz'
[ ... ]

You can also treat cheat as a local cheatsheet system, which is great for all the in-house commands you and your team have invented over the years. You can easily add a local cheatsheet to your own home directory, and cheat will find and display it just as if it were a popular system command.

[Nov 08, 2019] 5 alerting and visualization tools for sysadmins

Nov 08, 2019 | opensource.com

Common types of alerts and visualizations Alerts

Let's first cover what alerts are not . Alerts should not be sent if the human responder can't do anything about the problem. This includes alerts that are sent to multiple individuals with only a few who can respond, or situations where every anomaly in the system triggers an alert. This leads to alert fatigue and receivers ignoring all alerts within a specific medium until the system escalates to a medium that isn't already saturated.

For example, if an operator receives hundreds of emails a day from the alerting system, that operator will soon ignore all emails from the alerting system. The operator will respond to a real incident only when he or she is experiencing the problem, emailed by a customer, or called by the boss. In this case, alerts have lost their meaning and usefulness.

Alerts are not a constant stream of information or a status update. They are meant to convey a problem from which the system can't automatically recover, and they are sent only to the individual most likely to be able to recover the system. Everything that falls outside this definition isn't an alert and will only damage your employees and company culture.

Everyone has a different set of alert types, so I won't discuss things like priority levels (P1-P5) or models that use words like "Informational," "Warning," and "Critical." Instead, I'll describe the generic categories emergent in complex systems' incident response.

You might have noticed I mentioned an "Informational" alert type right after I wrote that alerts shouldn't be informational. Well, not everyone agrees, but I don't consider something an alert if it isn't sent to anyone. It is a data point that many systems refer to as an alert. It represents some event that should be known but not responded to. It is generally part of the visualization system of the alerting tool and not an event that triggers actual notifications. Mike Julian covers this and other aspects of alerting in his book Practical Monitoring . It's a must read for work in this area.

Non-informational alerts consist of types that can be responded to or require action. I group these into two categories: internal outage and external outage. (Most companies have more than two levels for prioritizing their response efforts.) Degraded system performance is considered an outage in this model, as the impact to each user is usually unknown.

Internal outages are a lower priority than external outages, but they still need to be responded to quickly. They often include internal systems that company employees use or components of applications that are visible only to company employees.

External outages consist of any system outage that would immediately impact a customer. These don't include a system outage that prevents releasing updates to the system. They do include customer-facing application failures, database outages, and networking partitions that hurt availability or consistency if either can impact a user. They also include outages of tools that may not have a direct impact on users, as the application continues to run but this transparent dependency impacts performance. This is common when the system uses some external service or data source that isn't necessary for full functionality but may cause delays as the application performs retries or handles errors from this external dependency.

Visualizations

There are many visualization types, and I won't cover them all here. It's a fascinating area of research. On the data analytics side of my career, learning and applying that knowledge is a constant challenge. We need to provide simple representations of complex system outputs for the widest dissemination of information. Google Charts and Tableau have a wide selection of visualization types. We'll cover the most common visualizations and some innovative solutions for quickly understanding systems.

Line chart

The line chart is probably the most common visualization. It does a pretty good job of producing an understanding of a system over time. A line chart in a metrics system would have a line for each unique metric or some aggregation of metrics. This can get confusing when there are a lot of metrics in the same dashboard (as shown below), but most systems can select specific metrics to view rather than having all of them visible. Also, anomalous behavior is easy to spot if it's significant enough to escape the noise of normal operations. Below we can see purple, yellow, and light blue lines that might indicate anomalous behavior.

monitoring_guide_line_chart.png

Another feature of a line chart is that you can often stack them to show relationships. For example, you might want to look at requests on each server individually, but also in aggregate. This allows you to understand the overall system as well as each instance in the same graph.

monitoring_guide_line_chart_aggregate.png Heatmaps

Another common visualization is the heatmap. It is useful when looking at histograms. This type of visualization is similar to a bar chart but can show gradients within the bars representing the different percentiles of the overall metric. For example, suppose you're looking at request latencies and you want to quickly understand the overall trend as well as the distribution of all requests. A heatmap is great for this, and it can use color to disambiguate the quantity of each section with a quick glance.

The heatmap below shows the higher concentration around the centerline of the graph with an easy-to-understand visualization of the distribution vertically for each time bucket. We might want to review a couple of points in time where the distribution gets wide while the others are fairly tight like at 14:00. This distribution might be a negative performance indicator.

monitoring_guide_histogram.png Gauges

The last common visualization I'll cover here is the gauge, which helps users understand a single metric quickly. Gauges can represent a single metric, like your speedometer represents your driving speed or your gas gauge represents the amount of gas in your car. Similar to the gas gauge, most monitoring gauges clearly indicate what is good and what isn't. Often (as is shown below), good is represented by green, getting worse by orange, and "everything is breaking" by red. The middle row below shows traditional gauges.

monitoring_guide_gauges.png Image source: Grafana.org (© Grafana Labs)

This image shows more than just traditional gauges. The other gauges are single stat representations that are similar to the function of the classic gauge. They all use the same color scheme to quickly indicate system health with just a glance. Arguably, the bottom row is probably the best example of a gauge that allows you to glance at a dashboard and know that everything is healthy (or not). This type of visualization is usually what I put on a top-level dashboard. It offers a full, high-level understanding of system health in seconds.

Flame graphs

A less common visualization is the flame graph, introduced by Netflix's Brendan Gregg in 2011. It's not ideal for dashboarding or quickly observing high-level system concerns; it's normally seen when trying to understand a specific application problem. This visualization focuses on CPU and memory and the associated frames. The X-axis lists the frames alphabetically, and the Y-axis shows stack depth. Each rectangle is a stack frame and includes the function being called. The wider the rectangle, the more it appears in the stack. This method is invaluable when trying to diagnose system performance at the application level and I urge everyone to give it a try.

monitoring_guide_flame_graph.png Image source: Wikimedia.org ( Creative Commons BY SA 3.0 ) Tool options

There are several commercial options for alerting, but since this is Opensource.com, I'll cover only systems that are being used at scale by real companies that you can use at no cost. Hopefully, you'll be able to contribute new and innovative features to make these systems even better.

Alerting tools Bosun

If you've ever done anything with computers and gotten stuck, the help you received was probably thanks to a Stack Exchange system. Stack Exchange runs many different websites around a crowdsourced question-and-answer model. Stack Overflow is very popular with developers, and Super User is popular with operations. However, there are now hundreds of sites ranging from parenting to sci-fi and philosophy to bicycles.

Stack Exchange open-sourced its alert management system, Bosun , around the same time Prometheus and its AlertManager system were released. There were many similarities in the two systems, and that's a really good thing. Like Prometheus, Bosun is written in Golang. Bosun's scope is more extensive than Prometheus' as it can interact with systems beyond metrics aggregation. It can also ingest data from log and event aggregation systems. It supports Graphite, InfluxDB, OpenTSDB, and Elasticsearch.

Bosun's architecture consists of a single server binary, a backend like OpenTSDB, Redis, and scollector agents . The scollector agents automatically detect services on a host and report metrics for those processes and other system resources. This data is sent to a metrics backend. The Bosun server binary then queries the backends to determine if any alerts need to be fired. Bosun can also be used by tools like Grafana to query the underlying backends through one common interface. Redis is used to store state and metadata for Bosun.

A really neat feature of Bosun is that it lets you test your alerts against historical data. This was something I missed in Prometheus several years ago, when I had data for an issue I wanted alerts on but no easy way to test it. To make sure my alerts were working, I had to create and insert dummy data. This system alleviates that very time-consuming process.

Bosun also has the usual features like showing simple graphs and creating alerts. It has a powerful expression language for writing alerting rules. However, it only has email and HTTP notification configurations, which means connecting to Slack and other tools requires a bit more customization ( which its documentation covers ). Similar to Prometheus, Bosun can use templates for these notifications, which means they can look as awesome as you want them to. You can use all your HTML and CSS skills to create the baddest email alert anyone has ever seen.

Cabot

Cabot was created by a company called Arachnys . You may not know who Arachnys is or what it does, but you have probably felt its impact: It built the leading cloud-based solution for fighting financial crimes. That sounds pretty cool, right? At a previous company, I was involved in similar functions around "know your customer" laws. Most companies would consider it a very bad thing to be linked to a terrorist group, for example, funneling money through their systems. These solutions also help defend against less-atrocious offenders like fraudsters who could also pose a risk to the institution.

So why did Arachnys create Cabot? Well, it is kind of a Christmas present to everyone, as it was a Christmas project built because its developers couldn't wrap their heads around Nagios . And really, who can blame them? Cabot was written with Django and Bootstrap, so it should be easy for most to contribute to the project. (Another interesting factoid: The name comes from the creator's dog.)

The Cabot architecture is similar to Bosun in that it doesn't collect any data. Instead, it accesses data through the APIs of the tools it is alerting for. Therefore, Cabot uses a pull (rather than a push) model for alerting. It reaches out into each system's API and retrieves the information it needs to make a decision based on a specific check. Cabot stores the alerting data in a Postgres database and also has a cache using Redis.

Cabot natively supports Graphite , but it also supports Jenkins , which is rare in this area. Arachnys uses Jenkins like a centralized cron, but I like this idea of treating build failures like outages. Obviously, a build failure isn't as critical as a production outage, but it could still alert the team and escalate if the failure isn't resolved. Who actually checks Jenkins every time an email comes in about a build failure? Yeah, me too!

Another interesting feature is that Cabot can integrate with Google Calendar for on-call rotations. Cabot calls this feature Rota, which is a British term for a roster or rotation. This makes a lot of sense, and I wish other systems would take this idea further. Cabot doesn't support anything more complex than primary and backup personnel, but there is certainly room for additional features. The docs say if you want something more advanced, you should look at a commercial option.

StatsAgg

StatsAgg ? How did that make the list? Well, it's not every day you come across a publishing company that has created an alerting platform. I think that deserves recognition. Of course, Pearson isn't just a publishing company anymore; it has several web presences and a joint venture with O'Reilly Media . However, I still think of it as the company that published my schoolbooks and tests.

StatsAgg isn't just an alerting platform; it's also a metrics aggregation platform. And it's kind of like a proxy for other systems. It supports Graphite, StatsD, InfluxDB, and OpenTSDB as inputs, but it can also forward those metrics to their respective platforms. This is an interesting concept, but potentially risky as loads increase on a central service. However, if the StatsAgg infrastructure is robust enough, it can still produce alerts even when a backend storage platform has an outage.

StatsAgg is written in Java and consists only of the main server and UI, which keeps complexity to a minimum. It can send alerts based on regular expression matching and is focused on alerting by service rather than host or instance. Its goal is to fill a void in the open source observability stack, and I think it does that quite well.

Visualization tools Grafana

Almost everyone knows about Grafana , and many have used it. I have used it for years whenever I need a simple dashboard. The tool I used before was deprecated, and I was fairly distraught about that until Grafana made it okay. Grafana was gifted to us by Torkel Ödegaard. Like Cabot, Grafana was also created around Christmastime, and released in January 2014. It has come a long way in just a few years. It started life as a Kibana dashboarding system, and Torkel forked it into what became Grafana.

Grafana's sole focus is presenting monitoring data in a more usable and pleasing way. It can natively gather data from Graphite, Elasticsearch, OpenTSDB, Prometheus, and InfluxDB. There's an Enterprise version that uses plugins for more data sources, but there's no reason those other data source plugins couldn't be created as open source, as the Grafana plugin ecosystem already offers many other data sources.

What does Grafana do for me? It provides a central location for understanding my system. It is web-based, so anyone can access the information, although it can be restricted using different authentication methods. Grafana can provide knowledge at a glance using many different types of visualizations. However, it has started integrating alerting and other features that aren't traditionally combined with visualizations.

Now you can set alerts visually. That means you can look at a graph, maybe even one showing where an alert should have triggered due to some degradation of the system, click on the graph where you want the alert to trigger, and then tell Grafana where to send the alert. That's a pretty powerful addition that won't necessarily replace an alerting platform, but it can certainly help augment it by providing a different perspective on alerting criteria.

Grafana has also introduced more collaboration features. Users have been able to share dashboards for a long time, meaning you don't have to create your own dashboard for your Kubernetes cluster because there are several already available -- with some maintained by Kubernetes developers and others by Grafana developers.

The most significant addition around collaboration is annotations. Annotations allow a user to add context to part of a graph. Other users can then use this context to understand the system better. This is an invaluable tool when a team is in the middle of an incident and communication and common understanding are critical. Having all the information right where you're already looking makes it much more likely that knowledge will be shared across the team quickly. It's also a nice feature to use during blameless postmortems when the team is trying to understand how the failure occurred and learn more about their system.

Vizceral

Netflix created Vizceral to understand its traffic patterns better when performing a traffic failover. Unlike Grafana, which is a more general tool, Vizceral serves a very specific use case. Netflix no longer uses this tool internally and says it is no longer actively maintained, but it still updates the tool periodically. I highlight it here primarily to point out an interesting visualization mechanism and how it can help solve a problem. It's worth running it in a demo environment just to better grasp the concepts and witness what's possible with these systems.

[Nov 08, 2019] What breaks our systems A taxonomy of black swans by Laura Nolan Feed

Oct 25, 2018 | opensource.com
Find and fix outlier events that create issues before they trigger severe production problems. Black swans are a metaphor for outlier events that are severe in impact (like the 2008 financial crash). In production systems, these are the incidents that trigger problems that you didn't know you had, cause major visible impact, and can't be fixed quickly and easily by a rollback or some other standard response from your on-call playbook. They are the events you tell new engineers about years after the fact.

Black swans, by definition, can't be predicted, but sometimes there are patterns we can find and use to create defenses against categories of related problems.

For example, a large proportion of failures are a direct result of changes (code, environment, or configuration). Each bug triggered in this way is distinctive and unpredictable, but the common practice of canarying all changes is somewhat effective against this class of problems, and automated rollbacks have become a standard mitigation.

As our profession continues to mature, other kinds of problems are becoming well-understood classes of hazards with generalized prevention strategies.

Black swans observed in the wild

All technology organizations have production problems, but not all of them share their analyses. The organizations that publicly discuss incidents are doing us all a service. The following incidents describe one class of a problem and are by no means isolated instances. We all have black swans lurking in our systems; it's just some of us don't know it yet.

Hitting limits

Programming and development

Running headlong into any sort of limit can produce very severe incidents. A canonical example of this was Instapaper's outage in February 2017 . I challenge any engineer who has carried a pager to read the outage report without a chill running up their spine. Instapaper's production database was on a filesystem that, unknown to the team running the service, had a 2TB limit. With no warning, it stopped accepting writes. Full recovery took days and required migrating its database.

The organizations that publicly discuss incidents are doing us all a service. Limits can strike in various ways. Sentry hit limits on maximum transaction IDs in Postgres . Platform.sh hit size limits on a pipe buffer . SparkPost triggered AWS's DDoS protection . Foursquare hit a performance cliff when one of its datastores ran out of RAM .

One way to get advance knowledge of system limits is to test periodically. Good load testing (on a production replica) ought to involve write transactions and should involve growing each datastore beyond its current production size. It's easy to forget to test things that aren't your main datastores (such as Zookeeper). If you hit limits during testing, you have time to fix the problems. Given that resolution of limits-related issues can involve major changes (like splitting a datastore), time is invaluable.

When it comes to cloud services, if your service generates unusual loads or uses less widely used products or features (such as older or newer ones), you may be more at risk of hitting limits. It's worth load testing these, too. But warn your cloud provider first.

Finally, where limits are known, add monitoring (with associated documentation) so you will know when your systems are approaching those ceilings. Don't rely on people still being around to remember.

Spreading slowness
"The world is much more correlated than we give credit to. And so we see more of what Nassim Taleb calls 'black swan events' -- rare events happen more often than they should because the world is more correlated."
-- Richard Thaler

HostedGraphite's postmortem on how an AWS outage took down its load balancers (which are not hosted on AWS) is a good example of just how much correlation exists in distributed computing systems. In this case, the load-balancer connection pools were saturated by slow connections from customers that were hosted in AWS. The same kinds of saturation can happen with application threads, locks, and database connections -- any kind of resource monopolized by slow operations.

HostedGraphite's incident is an example of externally imposed slowness, but often slowness can result from saturation somewhere in your own system creating a cascade and causing other parts of your system to slow down. An incident at Spotify demonstrates such spread -- the streaming service's frontends became unhealthy due to saturation in a different microservice. Enforcing deadlines for all requests, as well as limiting the length of request queues, can prevent such spread. Your service will serve at least some traffic, and recovery will be easier because fewer parts of your system will be broken.

Retries should be limited with exponential backoff and some jitter. An outage at Square, in which its Redis datastore became overloaded due to a piece of code that retried failed transactions up to 500 times with no backoff, demonstrates the potential severity of excessive retries. The Circuit Breaker design pattern can be helpful here, too.

Dashboards should be designed to clearly show utilization, saturation, and errors for all resources so problems can be found quickly.

Thundering herds

Often, failure scenarios arise when a system is under unusually heavy load. This can arise organically from users, but often it arises from systems. A surge of cron jobs that starts at midnight is a venerable example. Mobile clients can also be a source of coordinated demand if they are programmed to fetch updates at the same time (of course, it is much better to jitter such requests).

Events occurring at pre-configured times aren't the only source of thundering herds. Slack experienced multiple outages over a short time due to large numbers of clients being disconnected and immediately reconnecting, causing large spikes of load. CircleCI saw a severe outage when a GitLab outage ended, leading to a surge of builds queued in its database, which became saturated and very slow.

Almost any service can be the target of a thundering herd. Planning for such eventualities -- and testing that your plan works as intended -- is therefore a must. Client backoff and load shedding are often core to such approaches.

If your systems must constantly ingest data that can't be dropped, it's key to have a scalable way to buffer this data in a queue for later processing.

Automation systems are complex systems
"Complex systems are intrinsically hazardous systems."
-- Richard Cook, MD

If your systems must constantly ingest data that can't be dropped, it's key to have a scalable way to buffer this data in a queue for later processing. The trend for the past several years has been strongly towards more automation of software operations. Automation of anything that can reduce your system's capacity (e.g., erasing disks, decommissioning devices, taking down serving jobs) needs to be done with care. Accidents (due to bugs or incorrect invocations) with this kind of automation can take down your system very efficiently, potentially in ways that are hard to recover from.

Christina Schulman and Etienne Perot of Google describe some examples in their talk Help Protect Your Data Centers with Safety Constraints . One incident sent Google's entire in-house content delivery network (CDN) to disk-erase.

Schulman and Perot suggest using a central service to manage constraints, which limits the pace at which destructive automation can operate, and being aware of system conditions (for example, avoiding destructive operations if the service has recently had an alert).

Automation systems can also cause havoc when they interact with operators (or with other automated systems). Reddit experienced a major outage when its automation restarted a system that operators had stopped for maintenance. Once you have multiple automation systems, their potential interactions become extremely complex and impossible to predict.

It will help to deal with the inevitable surprises if all this automation writes logs to an easily searchable, central place. Automation systems should always have a mechanism to allow them to be quickly turned off (fully or only for a subset of operations or targets).

Defense against the dark swans

These are not the only black swans that might be waiting to strike your systems. There are many other kinds of severe problem that can be avoided using techniques such as canarying, load testing, chaos engineering, disaster testing, and fuzz testing -- and of course designing for redundancy and resiliency. Even with all that, at some point your system will fail.

To ensure your organization can respond effectively, make sure your key technical staff and your leadership have a way to coordinate during an outage. For example, one unpleasant issue you might have to deal with is a complete outage of your network. It's important to have a fail-safe communications channel completely independent of your own infrastructure and its dependencies. For instance, if you run on AWS, using a service that also runs on AWS as your fail-safe communication method is not a good idea. A phone bridge or an IRC server that runs somewhere separate from your main systems is good. Make sure everyone knows what the communications platform is and practices using it.

Another principle is to ensure that your monitoring and your operational tools rely on your production systems as little as possible. Separate your control and your data planes so you can make changes even when systems are not healthy. Don't use a single message queue for both data processing and config changes or monitoring, for example -- use separate instances. In SparkPost: The Day the DNS Died , Jeremy Blosser presents an example where critical tools relied on the production DNS setup, which failed.

The psychology of battling the black swan

To ensure your organization can respond effectively, make sure your key technical staff and your leadership have a way to coordinate during an outage. Dealing with major incidents in production can be stressful. It really helps to have a structured incident-management process in place for these situations. Many technology organizations ( including Google ) successfully use a version of FEMA's Incident Command System. There should be a clear way for any on-call individual to call for assistance in the event of a major problem they can't resolve alone.

For long-running incidents, it's important to make sure people don't work for unreasonable lengths of time and get breaks to eat and sleep (uninterrupted by a pager). It's easy for exhausted engineers to make a mistake or overlook something that might resolve the incident faster.

Learn more

There are many other things that could be said about black (or formerly black) swans and strategies for dealing with them. If you'd like to learn more, I highly recommend these two books dealing with resilience and stability in production: Susan Fowler's Production-Ready Microservices and Michael T. Nygard's Release It! .


Laura Nolan will present What Breaks Our Systems: A Taxonomy of Black Swans at LISA18 , October 29-31 in Nashville, Tennessee, USA.

[Nov 08, 2019] My first sysadmin mistake by Jim Hall

Wiping out of /etc directory is one thing that sysadmin accidentally do. This is often happen if the other directory is name etc, for example /Backup/etc. In such cases you automatically put a slash in front of etc because it is ingrained in your mind. And you put the slash in front of etc subconsciously, not realizing what you are doing. And then faces consequences. If you do not use saferm, the result are pretty devastating. In most cases the sever does not die, but new logins are impossible. SSH session survives. That's why it is important to backup /etc/at the first login to the server. On modern severs it takes a couple of seconds.
If subdirectories are intact then you still can copy the content from another server. But content of sysconfig subdirectory in linux is unique to the server and you need a backup to restore it.
Notable quotes:
"... As root. I thought I was deleting some stale cache files for one of our programs. Instead, I wiped out all files in the /etc directory by mistake. Ouch. ..."
"... I put together a simple strategy: Don't reboot the server. Use an identical system as a template, and re-create the ..."
Nov 08, 2019 | opensource.com
rm command in the wrong directory. As root. I thought I was deleting some stale cache files for one of our programs. Instead, I wiped out all files in the /etc directory by mistake. Ouch.

My clue that I'd done something wrong was an error message that rm couldn't delete certain subdirectories. But the cache directory should contain only files! I immediately stopped the rm command and looked at what I'd done. And then I panicked. All at once, a million thoughts ran through my head. Did I just destroy an important server? What was going to happen to the system? Would I get fired?

Fortunately, I'd run rm * and not rm -rf * so I'd deleted only files. The subdirectories were still there. But that didn't make me feel any better.

Immediately, I went to my supervisor and told her what I'd done. She saw that I felt really dumb about my mistake, but I owned it. Despite the urgency, she took a few minutes to do some coaching with me. "You're not the first person to do this," she said. "What would someone else do in your situation?" That helped me calm down and focus. I started to think less about the stupid thing I had just done, and more about what I was going to do next.

I put together a simple strategy: Don't reboot the server. Use an identical system as a template, and re-create the /etc directory.

Once I had my plan of action, the rest was easy. It was just a matter of running the right commands to copy the /etc files from another server and edit the configuration so it matched the system. Thanks to my practice of documenting everything, I used my existing documentation to make any final adjustments. I avoided having to completely restore the server, which would have meant a huge disruption.

To be sure, I learned from that mistake. For the rest of my years as a systems administrator, I always confirmed what directory I was in before running any command.

I also learned the value of building a "mistake strategy." When things go wrong, it's natural to panic and think about all the bad things that might happen next. That's human nature. But creating a "mistake strategy" helps me stop worrying about what just went wrong and focus on making things better. I may still think about it, but knowing my next steps allows me to "get over it."

[Nov 08, 2019] 13 open source backup solutions by Don Watkins

This is mostly the list. You need to do your own research. Some improtant backup applications are not mentioned. It is unclear from it what are methods used in each, and why each of them is preferable to tar. The stress in the list is on portability (Linux plus Mc and windows, not just Linux)
Mar 07, 2019 | opensource.com

Recently, we published a poll that asked readers to vote on their favorite open source backup solution. We offered six solutions recommended by our moderator community -- Cronopete, Deja Dup, Rclone, Rdiff-backup, Restic, and Rsync -- and invited readers to share other options in the comments. And you came through, offering 13 other solutions (so far) that we either hadn't considered or hadn't even heard of.

By far the most popular suggestion was BorgBackup . It is a deduplicating backup solution that features compression and encryption. It is supported on Linux, MacOS, and BSD and has a BSD License.

Second was UrBackup , which does full and incremental image and file backups; you can save whole partitions or single directories. It has clients for Windows, Linux, and MacOS and has a GNU Affero Public License.

Third was LuckyBackup ; according to its website, "it is simple to use, fast (transfers over only changes made and not all data), safe (keeps your data safe by checking all declared directories before proceeding in any data manipulation), reliable, and fully customizable." It carries a GNU Public License.

Casync is content-addressable synchronization -- it's designed for backup and synchronizing and stores and retrieves multiple related versions of large file systems. It is licensed with the GNU Lesser Public License.

Syncthing synchronizes files between two computers. It is licensed with the Mozilla Public License and, according to its website, is secure and private. It works on MacOS, Windows, Linux, FreeBSD, Solaris, and OpenBSD.

Duplicati is a free backup solution that works on Windows, MacOS, and Linux and a variety of standard protocols, such as FTP, SSH, and WebDAV, and cloud services. It features strong encryption and is licensed with the GPL.

Dirvish is a disk-based virtual image backup system licensed under OSL-3.0. It also requires Rsync, Perl5, and SSH to be installed.

Bacula 's website says it "is a set of computer programs that permits the system administrator to manage backup, recovery, and verification of computer data across a network of computers of different kinds." It is supported on Linux, FreeBSD, Windows, MacOS, OpenBSD, and Solaris and the bulk of its source code is licensed under AGPLv3.

BackupPC "is a high-performance, enterprise-grade system for backing up Linux, Windows, and MacOS PCs and laptops to a server's disk," according to its website. It is licensed under the GPLv3.

Amanda is a backup system written in C and Perl that allows a system administrator to back up an entire network of client machines to a single server using tape, disk, or cloud-based systems. It was developed and copyrighted in 1991 at the University of Maryland and has a BSD-style license.

Back in Time is a simple backup utility designed for Linux. It provides a command line client and a GUI, both written in Python. To do a backup, just specify where to store snapshots, what folders to back up, and the frequency of the backups. BackInTime is licensed with GPLv2.

Timeshift is a backup utility for Linux that is similar to System Restore for Windows and Time Capsule for MacOS. According to its GitHub repository, "Timeshift protects your system by taking incremental snapshots of the file system at regular intervals. These snapshots can be restored at a later date to undo all changes to the system."

Kup is a backup solution that was created to help users back up their files to a USB drive, but it can also be used to perform network backups. According to its GitHub repository, "When you plug in your external hard drive, Kup will automatically start copying your latest changes."

[Nov 08, 2019] What you probably didn't know about sudo

Nov 08, 2019 | opensource.com

Enable features for a certain group of users

The sudo command comes with a huge set of defaults. Still, there are situations when you want to override some of these. This is when you use the Defaults statement in the configuration. Usually, these defaults are enforced on every user, but you can narrow the setting down to a subset of users based on host, username, and so on. Here is an example that my generation of sysadmins loves to hear about: insults. These are just some funny messages for when someone mistypes a password:

czanik @ linux-mewy:~ > sudo ls
[ sudo ] password for root:
Hold it up to the light --- not a brain in sight !
[ sudo ] password for root:
My pet ferret can type better than you !
[ sudo ] password for root:
sudo: 3 incorrect password attempts
czanik @ linux-mewy:~ >

Because not everyone is a fan of sysadmin humor, these insults are disabled by default. The following example shows how to enable this setting only for your seasoned sysadmins, who are members of the wheel group:

Defaults !insults
Defaults:%wheel insults

I do not have enough fingers to count how many people thanked me for bringing these messages back.

Digest verification

There are, of course, more serious features in sudo as well. One of them is digest verification. You can include the digest of applications in your configuration:

peter ALL = sha244:11925141bb22866afdf257ce7790bd6275feda80b3b241c108b79c88 /usr/bin/passwd

In this case, sudo checks and compares the digest of the application to the one stored in the configuration before running the application. If they do not match, sudo refuses to run the application. While it is difficult to maintain this information in your configuration -- there are no automated tools for this purpose -- these digests can provide you with an additional layer of protection.

Session recording

Session recording is also a lesser-known feature of sudo . After my demo, many people leave my talk with plans to implement it on their infrastructure. Why? Because with session recording, you see not just the command name, but also everything that happened in the terminal. You can see what your admins are doing even if they have shell access and logs only show that bash is started.

There is one limitation, currently. Records are stored locally, so with enough permissions, users can delete their traces. Stay tuned for upcoming features. New features

There is a new version of sudo right around the corner. Version 1.9 will include many interesting new features. Here are the most important planned features:

Conclusion

I hope this article proved to you that sudo is a lot more than just a simple prefix. There are tons of possibilities to fine-tune permissions on your system. You cannot just fine-tune permissions, but also improve security by checking digests. Session recordings enable you to check what is happening on your systems. You can also extend the functionality of sudo using plugins, either using something already available or writing your own. Finally, given the list of upcoming features you can see that even if sudo is decades old, it is a living project that is constantly evolving.

If you want to learn more about sudo , here are a few resources:

[Nov 08, 2019] Five tips for using revision control in operations Opensource.com

Nov 08, 2019 | opensource.com

Whether you're still using Subversion (SVN), or have moved to a distributed system like Git , revision control has found its place in modern operations infrastructures. If you listen to talks at conferences and see what new companies are doing, it can be easy to assume that everyone is now using revision control, and using it effectively. Unfortunately that's not the case. I routinely interact with organizations who either don't track changes in their infrastructure at all, or are not doing so in an effective manner.

If you're looking for a way to convince your boss to spend the time to set it up, or are simply looking for some tips to improve how use it, the following are five tips for using revision control in operations.

1. Use revision control

A long time ago in a galaxy far, far away, systems administrators would log into servers and make changes to configuration files and restart services manually. Over time we've worked to automate much of this, to the point that members of your operations team may not even have login access to your servers. Everything may be centrally managed through your configuration management system, or other tooling, to automatically manage services. Whatever you're using to handle your configurations and expectations in your infrastructure, you should have a history of changes made to it.

Having a history of changes allows you to:

This first move can often be the hardest thing for an organization to do. You're moving from static configurations, or configuration management files on a filesystem, into a revision control system which changes the process, and often the speed at which changes can be made. Your engineers need to know how to use revision control and get used to the idea that all changes they put into production will be tracked by other people in the organization.

2. Have a plan for what should be put in revision control

This has a few major components: make sure you have multiple repositories in your infrastructure that are targeted at specific services; don't put auto-generated content or binaries into revision control; and make sure you're working securely.

First, you typically want to split up different parts of your services into different repositories. This allows fine-tuned control of who has access to commit to specific service repositories. It also prevents a repository from getting too large, which can complicate the life of your systems adminstrators who are trying to copy it onto their systems.

You may not believe that a repository can get very big, since it's just text files, but you'll have a different perspective when you have been using a repository for five years, and every copy includes every change ever made. Let me show you the system-config repository for the OpenStack Infrastructure project. The first commit to this project was made in July 2011:

elizabeth@r2d2$:~$ time git clone https://git.openstack.org/openstack-infra/system-config
Cloning into 'system-config'...
remote: Counting objects: 79237, done.
remote: Compressing objects: 100% (37446/37446), done.
remote: Total 79237 (delta 50215), reused 64257 (delta 35955)
Receiving objects: 100% (79237/79237), 10.52 MiB | 2.78 MiB/s, done.
Resolving deltas: 100% (50215/50215), done.
Checking connectivity... done.

real    0m7.600s
user    0m3.344s
sys     0m0.680s

That's over 10M of data for a text-only repository over five years.

Again, yes, text-only. You typically want to avoid stuffing binaries into revision control. You often don't get diffs on these and they just bloat your repository. Find a better way to distribute your binaries. You also don't want your auto-generated files in revision control. Put the configuration file that creates those auto-generated files into revision control, and let your configuration management tooling do its work to auto-generate the files.

Finally, split out all secret data into separate repositories. Organizations can get a considerable amount of benefit from allowing all of their technical staff see their repositories, but you don't necessarily want to expose every private SSH or SSL key to everyone. You may also consider open sourcing some of your tooling some day, so making sure you have no private data in your repository from the beginning will prevent a lot of headaches later on.

3. Make it your canonical place for changes, and deploy from it

Your revision control system must be a central part of your infrastructure. You can't just update it when you remember to, or add it to a checklist of things you should be doing when you make a change in production. Any time someone makes a change, it must be tracked in revision control. If it's not, file a bug and make sure that configuration file is added to revision control in the future.

This also means you should deploy from the configuration tracked in your revision control system. No one should be able to log into a server and make changes without it being put through revision control, except in rare case of an actual emergency where you have a strict process for getting back on track as soon as the emergency has concluded.

4. Use pre-commit scripts

Humans are pretty forgetful, and we sysadmins are routinely distracted and work in a very interrupt-driven environments. Help yourself out and provide your systems administrators with some scripts to remind them what the change message should include.

These reminders may include:

As a bonus, this also documents what reviewers of your change should look for.

Wait, reviewers? That's #5.

5: Hook it all into a code review system

Now that you have a healthy revision control system set up, you have an excellent platform for adding another great tool for systems administrators: code review. This should typically be a peer-review system where your fellow systems administrators of all levels can review changes, and your changes will meet certain agreed-upon criteria for merging. This allows a whole team to take responsibility for a change, and not just the person who came up with it. I like to joke that since changes on our team requires two people to approve, it becomes the fault of three people when something goes wrong, not just one!

Starting out, you don't need to do anything fancy, maybe just have team members submit a merge proposal or pull request and socially make sure someone other than the proposer is the one to merge it. Eventually you can look into more sophisticated code review systems. Most code review tools allow features like inline commenting and discussion-style commenting which provide a non-confrontational way to suggest changes to your colleagues. It's also great for remotely distributed teams who may not be talking face to face about changes, or even awake at the same time.

Bonus: Do tests on every commit!

You have everything in revision control, you have your fellow humans review it, why not add some robots? Start with simple things: Computers are very good at making sure files are alphabetized, humans are not. Computers are really good at making sure syntax is correct (the right number of spaces/tabs?); checking for these things is a waste of time for your brilliant systems engineers. Once you have simple tests in place, start adding unit tests, functional tests, and integration testing so you are confident that your changes won't break in production. This will also help you find where you haven't automated your production infrastructure and solve those problems in your development environment first. Topics Sysadmin About the author Elizabeth K. Joseph - After spending a decade doing Linux systems administration, today Elizabeth K. Joseph works as a developer advocate at IBM focused on IBM Z. As a systems administrator, she worked for a small services provider in Philadelphia before joining HPE where she worked for four years on the geographically distributed OpenStack Infrastructure team. This team runs the fully open source infrastructure for OpenStack development and lead to an interest in other open source projects that have opened up their... More about me

Recommended reading
What you probably didn't know about sudo

Build web apps to automate sysadmin tasks

A guide to human communication for sysadmins

Linux permissions 101

16 essentials for sysadmin superheroes

What does it mean to be a sysadmin hero?
4 Comments

Jenifer Soflous on 25 Jul 2016 Permalink

Great. This article is very useful for my research. I thank you so much for the Author.

Ben Cotton on 25 Jul 2016 Permalink

When I worked on a team of 10-ish, we had our SVN server email all commits to the team. This gave everyone a chance to see what changes were being made in real time, even if it was to a part of the system they didn't normally touch. I personally found it really useful in improving my own work. I'd love to see more articles expanding on each of your points here (hint hint!)

Elizabeth K. Joseph on 27 Jul 2016 Permalink

The email point is a good one! We have optional tuning of email via our code review system, and it's also allowed me to improve my own work and keep a better handle on what's going on with the team when I'm traveling (and thus not hovering over IRC).

Shawn H Corey on 25 Jul 2016 Permalink

For anyone using revision control, not just admins.

1. Add a critic to checkin. A critic not only checks syntax but for common bugs.

2. Add a tidy to checkin. It reformats the code into The Style. That way, developers don't have to waste time making conform to The Style.

[Nov 08, 2019] Winterize your Bash prompt in Linux

Nov 08, 2019 | opensource.com

Your Linux terminal probably supports Unicode, so why not take advantage of that and add a seasonal touch to your prompt? 11 Dec 2018 Jason Baker (Red Hat) Feed 84 up 3 comments Image credits : Jason Baker x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0

Hello once again for another installment of the Linux command-line toys advent calendar. If this is your first visit to the series, you might be asking yourself what a command-line toy even is? Really, we're keeping it pretty open-ended: It's anything that's a fun diversion at the terminal, and we're giving bonus points for anything holiday-themed.

Maybe you've seen some of these before, maybe you haven't. Either way, we hope you have fun.

Today's toy is super-simple: It's your Bash prompt. Your Bash prompt? Yep! We've got a few more weeks of the holiday season left to stare at it, and even more weeks of winter here in the northern hemisphere, so why not have some fun with it.

Your Bash prompt currently might be a simple dollar sign ( $ ), or more likely, it's something a little longer. If you're not sure what makes up your Bash prompt right now, you can find it in an environment variable called $PS1. To see it, type:

echo $PS1

For me, this returns:

[\u@\h \W]\$

The \u , \h , and \W are special characters for username, hostname, and working directory. There are others you can use as well; for help building out your Bash prompt, you can use EzPrompt , an online generator of PS1 configurations that includes lots of options including date and time, Git status, and more.

You may have other variables that make up your Bash prompt set as well; $PS2 for me contains the closing brace of my command prompt. See this article for more information.

To change your prompt, simply set the environment variable in your terminal like this:

$ PS1 = '\u is cold: '
jehb is cold:

To set it permanently, add the same code to your /etc/bashrc using your favorite text editor.

So what does this have to do with winterization? Well, chances are on a modern machine, your terminal support Unicode, so you're not limited to the standard ASCII character set. You can use any emoji that's a part of the Unicode specification, including a snowflake ❄, a snowman ☃, or a pair of skis 🎿. You've got plenty of wintery options to choose from.

🎄 Christmas Tree
🧥 Coat
🦌 Deer
🧤 Gloves
🤶 Mrs. Claus
🎅 Santa Claus
🧣 Scarf
🎿 Skis
🏂 Snowboarder
❄ Snowflake
☃ Snowman
⛄ Snowman Without Snow
🎁 Wrapped Gift

Pick your favorite, and enjoy some winter cheer. Fun fact: modern filesystems also support Unicode characters in their filenames, meaning you can technically name your next program "❄❄❄❄❄.py" . That said, please don't.

Do you have a favorite command-line toy that you think I ought to include? The calendar for this series is mostly filled out but I've got a few spots left. Let me know in the comments below, and I'll check it out. If there's space, I'll try to include it. If not, but I get some good submissions, I'll do a round-up of honorable mentions at the end.

[Nov 08, 2019] How to change the default shell prompt

Jun 29, 2014 | access.redhat.com
Raw
**PS1** - The value of this parameter is expanded and used as the primary prompt string. The default value is \u@\h \W\\$ .
**PS2** - The value of this parameter is expanded as with PS1 and used as the secondary prompt string. The default is ]
**PS3** - The value of this parameter is used as the prompt for the select command
**PS4** - The value of this parameter is expanded as with PS1 and the value is printed before each command bash displays during an execution trace. The first character of PS4 is replicated multiple times, as necessary, to indicate multiple levels of indirection. The default is +
Raw
\u = username
\h = hostname
\W = current working directory
Raw
# echo $PS1
Raw
# PS1='[[prod]\u@\h \W]\$'
Raw
[[prod]root@hostname ~]#

Find this line:

Raw
[ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[\u@\h \W]\\$ "

And change it as needed:

Raw
[ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[[prod]\u@\h \W]\\$ "

This solution is part of Red Hat's fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form. 2 Comments Log in to comment MW Community Member 48 points

6 October 2016 1:53 PM Mike Willis

This solution has simply "Red Hat Enterprise Linux" in the Environment section implying it applies to all versions of Red Hat Enterprise Linux.

Editing /etc/bashrc is against the advice of the comments in /etc/bashrc on Red Hat Enterprise Linux 7 which say

Raw
# It's NOT a good idea to change this file unless you know what you
# are doing. It's much better to create a custom.sh shell script in
# /etc/profile.d/ to make custom changes to your environment, as this
# will prevent the need for merging in future updates.

On RHEL 7 instead of the solution suggested above create a /etc/profile.d/custom.sh which contains

Raw
PS1="[[prod]\u@\h \W]\\$ "
27 March 2019 12:44 PM Mike Chanslor

Hello Red Hat community! I also found this useful: Raw

Special prompt variable characters:
 \d   The date, in "Weekday Month Date" format (e.g., "Tue May 26"). 

 \h   The hostname, up to the first . (e.g. deckard) 
 \H   The hostname. (e.g. deckard.SS64.com)

 \j   The number of jobs currently managed by the shell. 

 \l   The basename of the shell's terminal device name. 

 \s   The name of the shell, the basename of $0 (the portion following 
      the final slash). 

 \t   The time, in 24-hour HH:MM:SS format. 
 \T   The time, in 12-hour HH:MM:SS format. 
 \@   The time, in 12-hour am/pm format. 

 \u   The username of the current user. 

 \v   The version of Bash (e.g., 2.00) 

 \V   The release of Bash, version + patchlevel (e.g., 2.00.0) 

 \w   The current working directory. 
 \W   The basename of $PWD. 

 \!   The history number of this command. 
 \#   The command number of this command. 

 \$   If you are not root, inserts a "$"; if you are root, you get a "#"  (root uid = 0) 

 \nnn   The character whose ASCII code is the octal value nnn. 

 \n   A newline. 
 \r   A carriage return. 
 \e   An escape character (typically a color code). 
 \a   A bell character.
 \\   A backslash. 

 \[   Begin a sequence of non-printing characters. (like color escape sequences). This
      allows bash to calculate word wrapping correctly.

 \]   End a sequence of non-printing characters.
Using single quotes instead of double quotes when exporting your PS variables is recommended, it makes the prompt a tiny bit faster to evaluate plus you can then do an echo $PS1 to see the current prompt settings.

[Nov 08, 2019] 7 Bash history shortcuts you will actually use by Ian Miell Feed

Nov 08, 2019 | opensource.com

7 Bash history shortcuts you will actually use Save time on the command line with these essential Bash shortcuts. 02 Oct 2019 37 up 12 comments Image by : Opensource.com x Subscribe now

This article outlines the shortcuts I actually use every day. It is based on some of the contents of my book, Learn Bash the hard way ; (you can read a preview of it to learn more).

When people see me use these shortcuts, they often ask me, "What did you do there!?" There's minimal effort or intelligence required, but to really learn them, I recommend using one each day for a week, then moving to the next one. It's worth taking your time to get them under your fingers, as the time you save will be significant in the long run.

1. The "last argument" one: !$

If you only take one shortcut from this article, make it this one. It substitutes in the last argument of the last command into your line.

Consider this scenario:

$ mv / path / to / wrongfile / some / other / place
mv: cannot stat '/path/to/wrongfile' : No such file or directory

Ach, I put the wrongfile filename in my command. I should have put rightfile instead.

You might decide to retype the last command and replace wrongfile with rightfile completely. Instead, you can type:

$ mv / path / to / rightfile ! $
mv / path / to / rightfile / some / other / place

and the command will work.

There are other ways to achieve the same thing in Bash with shortcuts, but this trick of reusing the last argument of the last command is one I use the most.

2. The " n th argument" one: !:2

Ever done anything like this?

$ tar -cvf afolder afolder.tar
tar: failed to open

Like many others, I get the arguments to tar (and ln ) wrong more often than I would like to admit.

tar_2x.png

When you mix up arguments like that, you can run:

$ ! : 0 ! : 1 ! : 3 ! : 2
tar -cvf afolder.tar afolder

and your reputation will be saved.

The last command's items are zero-indexed and can be substituted in with the number after the !: .

Obviously, you can also use this to reuse specific arguments from the last command rather than all of them.

3. The "all the arguments" one: !:1-$

Imagine I run a command like:

$ grep '(ping|pong)' afile

The arguments are correct; however, I want to match ping or pong in a file, but I used grep rather than egrep .

I start typing egrep , but I don't want to retype the other arguments. So I can use the !:1$ shortcut to ask for all the arguments to the previous command from the second one (remember they're zero-indexed) to the last one (represented by the $ sign).

$ egrep ! : 1 -$
egrep '(ping|pong)' afile
ping

You don't need to pick 1-$ ; you can pick a subset like 1-2 or 3-9 (if you had that many arguments in the previous command).

4. The "last but n " one: !-2:$

The shortcuts above are great when I know immediately how to correct my last command, but often I run commands after the original one, which means that the last command is no longer the one I want to reference.

For example, using the mv example from before, if I follow up my mistake with an ls check of the folder's contents:

$ mv / path / to / wrongfile / some / other / place
mv: cannot stat '/path/to/wrongfile' : No such file or directory
$ ls / path / to /
rightfile

I can no longer use the !$ shortcut.

In these cases, I can insert a - n : (where n is the number of commands to go back in the history) after the ! to grab the last argument from an older command:

$ mv / path / to / rightfile ! - 2 :$
mv / path / to / rightfile / some / other / place

Again, once you learn it, you may be surprised at how often you need it.

5. The "get me the folder" one: !$:h

This one looks less promising on the face of it, but I use it dozens of times daily.

Imagine I run a command like this:

$ tar -cvf system.tar / etc / system
tar: / etc / system: Cannot stat: No such file or directory
tar: Error exit delayed from previous errors.

The first thing I might want to do is go to the /etc folder to see what's in there and work out what I've done wrong.

I can do this at a stroke with:

$ cd ! $:h
cd / etc

This one says: "Get the last argument to the last command ( /etc/system ) and take off its last filename component, leaving only the /etc ."

6. The "the current line" one: !#:1

For years, I occasionally wondered if I could reference an argument on the current line before finally looking it up and learning it. I wish I'd done so a long time ago. I most commonly use it to make backup files:

$ cp / path / to / some / file ! #:1.bak
cp / path / to / some / file / path / to / some / file.bak

but once under the fingers, it can be a very quick alternative to

7. The "search and replace" one: !!:gs

This one searches across the referenced command and replaces what's in the first two / characters with what's in the second two.

Say I want to tell the world that my s key does not work and outputs f instead:

$ echo my f key doef not work
my f key doef not work

Then I realize that I was just hitting the f key by accident. To replace all the f s with s es, I can type:

$ !! :gs / f / s /
echo my s key does not work
my s key does not work

It doesn't work only on single characters; I can replace words or sentences, too:

$ !! :gs / does / did /
echo my s key did not work
my s key did not work Test them out

Just to show you how these shortcuts can be combined, can you work out what these toenail clippings will output?

$ ping ! #:0:gs/i/o
$ vi / tmp /! : 0 .txt
$ ls ! $:h
$ cd ! - 2 :h
$ touch ! $! - 3 :$ !! ! $.txt
$ cat ! : 1 -$ Conclusion

Bash can be an elegant source of shortcuts for the day-to-day command-line user. While there are thousands of tips and tricks to learn, these are my favorites that I frequently put to use.

If you want to dive even deeper into all that Bash can teach you, pick up my book, Learn Bash the hard way or check out my online course, Master the Bash shell .


This article was originally posted on Ian's blog, Zwischenzugs.com , and is reused with permission.

[Nov 08, 2019] A guide to logging in Python Opensource.com

Nov 08, 2019 | opensource.com

A guide to logging in Python Logging is a critical way to understand what's going on inside an application. 13 Sep 2017 Mario Corchero Feed 437 up 10 comments Image credits : Pixabay. CC0 x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0

When applications are running in production, they become black boxes that need to be traced and monitored. One of the simplest, yet most important ways to do so is by logging. Logging allows us -- at the time we develop our software -- to instruct the program to emit information while the system is running that will be useful for us and our sysadmins.

More Python Resources

The same way we document code for future developers, we should direct new software to generate adequate logs for developers and sysadmins. Logs are a critical part of the system documentation about an application's runtime status. When instrumenting your software with logs, think of it like writing documentation for developers and sysadmins who will maintain the system in the future.

Some purists argue that a disciplined developer who uses logging and testing should hardly need an interactive debugger. If we cannot reason about our application during development with verbose logging, it will be even harder to do it when our code is running in production.

This article looks at Python's logging module, its design, and ways to adapt it for more complex use cases. This is not intended as documentation for developers, rather as a guide to show how the Python logging module is built and to encourage the curious to delve deeper.

Why use the logging module?

A developer might argue, why aren't simple print statements sufficient? The logging module offers multiple benefits, including:

This last point, the actual separation of the what we log from the how we log enables collaboration between different parts of the software. As an example, it allows the developer of a framework or library to add logs and let the sysadmin or person in charge of the runtime configuration decide what should be logged at a later point.

What's in the logging module

The logging module beautifully separates the responsibility of each of its parts (following the Apache Log4j API's approach). Let's look at how a log line travels around the module's code and explore its different parts.

Logger

Loggers are the objects a developer usually interacts with. They are the main APIs that indicate what we want to log.

Given an instance of a logger , we can categorize and ask for messages to be emitted without worrying about how or where they will be emitted.

For example, when we write logger.info("Stock was sold at %s", price) we have the following model in mind:

1-ikkitaj.jpg

We request a line and we assume some code is executed in the logger that makes that line appear in the console/file. But what actually is happening inside?

Log records

Log records are packages that the logging module uses to pass all required information around. They contain information about the function where the log was requested, the string that was passed, arguments, call stack information, etc.

These are the objects that are being logged. Every time we invoke our loggers, we are creating instances of these objects. But how do objects like these get serialized into a stream? Via handlers!

Handlers

Handlers emit the log records into any output. They take log records and handle them in the function of what they were built for.

As an example, a FileHandler will take a log record and append it to a file.

The standard logging module already comes with multiple built-in handlers like:

We have now a model that's closer to reality:

2-crmvrmf.jpg

But most handlers work with simple strings (SMTPHandler, FileHandler, etc.), so you may be wondering how those structured LogRecords are transformed into easy-to-serialize bytes...

Formatters

Let me present the Formatters. Formatters are in charge of serializing the metadata-rich LogRecord into a string. There is a default formatter if none is provided.

The generic formatter class provided by the logging library takes a template and style as input. Then placeholders can be declared for all the attributes in a LogRecord object.

As an example: '%(asctime)s %(levelname)s %(name)s: %(message)s' will generate logs like 2017-07-19 15:31:13,942 INFO parent.child: Hello EuroPython .

Note that the attribute message is the result of interpolating the log's original template with the arguments provided. (e.g., for logger.info("Hello %s", "Laszlo") , the message will be "Hello Laszlo").

All default attributes can be found in the logging documentation .

OK, now that we know about formatters, our model has changed again:

3-olzyr7p.jpg Filters

The last objects in our logging toolkit are filters.

Filters allow for finer-grained control of which logs should be emitted. Multiple filters can be attached to both loggers and handlers. For a log to be emitted, all filters should allow the record to pass.

Users can declare their own filters as objects using a filter method that takes a record as input and returns True / False as output.

With this in mind, here is the current logging workflow:

4-nfjk7uc.jpg The logger hierarchy

At this point, you might be impressed by the amount of complexity and configuration that the module is hiding so nicely for you, but there is even more to consider: the logger hierarchy.

We can create a logger via logging.getLogger(<logger_name>) . The string passed as an argument to getLogger can define a hierarchy by separating the elements using dots.

As an example, logging.getLogger("parent.child") will create a logger "child" with a parent logger named "parent." Loggers are global objects managed by the logging module, so they can be retrieved conveniently anywhere during our project.

Logger instances are also known as channels. The hierarchy allows the developer to define the channels and their hierarchy.

After the log record has been passed to all the handlers within the logger, the parents' handlers will be called recursively until we reach the top logger (defined as an empty string) or a logger has configured propagate = False . We can see it in the updated diagram:

5-sjkrpt3.jpg

Note that the parent logger is not called, only its handlers. This means that filters and other code in the logger class won't be executed on the parents. This is a common pitfall when adding filters to loggers.

Recapping the workflow

We've examined the split in responsibility and how we can fine tune log filtering. But there are two other attributes we haven't mentioned yet:

  1. Loggers can be disabled, thereby not allowing any record to be emitted from them.
  2. An effective level can be configured in both loggers and handlers.

As an example, when a logger has configured a level of INFO , only INFO levels and above will be passed. The same rule applies to handlers.

With all this in mind, the final flow diagram in the logging documentation looks like this:

6-kd6eiyr.jpg How to use logging

Now that we've looked at the logging module's parts and design, it's time to examine how a developer interacts with it. Here is a code example:

import logging

def sample_function ( secret_parameter ) :
logger = logging . getLogger ( __name__ ) # __name__=projectA.moduleB
logger. debug ( "Going to perform magic with '%s'" , secret_parameter )
...
try :
result = do_magic ( secret_parameter )
except IndexError :
logger. exception ( "OMG it happened again, someone please tell Laszlo" )
except :
logger. info ( "Unexpected exception" , exc_info = True )
raise
else :
logger. info ( "Magic with '%s' resulted in '%s'" , secret_parameter , result , stack_info = True )

This creates a logger using the module __name__ . It will create channels and hierarchies based on the project structure, as Python modules are concatenated with dots.

The logger variable references the logger "module," having "projectA" as a parent, which has "root" as its parent.

On line 5, we see how to perform calls to emit logs. We can use one of the methods debug , info , error , or critical to log using the appropriate level.

When logging a message, in addition to the template arguments, we can pass keyword arguments with specific meaning. The most interesting are exc_info and stack_info . These will add information about the current exception and the stack frame, respectively. For convenience, a method exception is available in the logger objects, which is the same as calling error with exc_info=True .

These are the basics of how to use the logger module. ʘ‿ʘ. But it is also worth mentioning some uses that are usually considered bad practices.

Greedy string formatting

Using loggger.info("string template {}".format(argument)) should be avoided whenever possible in favor of logger.info("string template %s", argument) . This is a better practice, as the actual string interpolation will be used only if the log will be emitted. Not doing so can lead to wasted cycles when we are logging on a level over INFO , as the interpolation will still occur.

Capturing and formatting exceptions

Quite often, we want to log information about the exception in a catch block, and it might feel intuitive to use:

try :
...
except Exception as error:
logger. info ( "Something bad happened: %s" , error )

But that code can give us log lines like Something bad happened: "secret_key." This is not that useful. If we use exc_info as illustrated previously, it will produce the following:

try :
...
except Exception :
logger. info ( "Something bad happened" , exc_info = True ) Something bad happened
Traceback ( most recent call last ) :
File "sample_project.py" , line 10 , in code
inner_code ()
File "sample_project.py" , line 6 , in inner_code
x = data [ "secret_key" ]
KeyError : 'secret_key'

This not only contains the exact source of the exception, but also the type.

Configuring our loggers

It's easy to instrument our software, and we need to configure the logging stack and specify how those records will be emitted.

There are multiple ways to configure the logging stack.

BasicConfig

This is by far the simplest way to configure logging. Just doing logging.basicConfig(level="INFO") sets up a basic StreamHandler that will log everything on the INFO and above levels to the console. There are arguments to customize this basic configuration. Some of them are:

Format Description Example
filename Specifies that a FileHandler should be created, using the specified filename, rather than a StreamHandler /var/logs/logs.txt
format Use the specified format string for the handler "'%(asctime)s %(message)s'"
datefmt Use the specified date/time format "%H:%M:%S"
level Set the root logger level to the specified level "INFO"

This is a simple and practical way to configure small scripts.

Note, basicConfig only works the first time it is called in a runtime. If you have already configured your root logger, calling basicConfig will have no effect.

DictConfig

The configuration for all elements and how to connect them can be specified as a dictionary. This dictionary should have different sections for loggers, handlers, formatters, and some basic global parameters.

Here's an example:

config = {
'disable_existing_loggers' : False ,
'version' : 1 ,
'formatters' : {
'short' : {
'format' : '%(asctime)s %(levelname)s %(name)s: %(message)s'
} ,
} ,
'handlers' : {
'console' : {
'level' : 'INFO' ,
'formatter' : 'short' ,
'class' : 'logging.StreamHandler' ,
} ,
} ,
'loggers' : {
'' : {
'handlers' : [ 'console' ] ,
'level' : 'ERROR' ,
} ,
'plugins' : {
'handlers' : [ 'console' ] ,
'level' : 'INFO' ,
'propagate' : False
}
} ,
}
import logging . config
logging . config . dictConfig ( config )

When invoked, dictConfig will disable all existing loggers, unless disable_existing_loggers is set to false . This is usually desired, as many modules declare a global logger that will be instantiated at import time, before dictConfig is called.

You can see the schema that can be used for the dictConfig method . Often, this configuration is stored in a YAML file and configured from there. Many developers often prefer this over using fileConfig , as it offers better support for customization.

Extending logging

Thanks to the way it is designed, it is easy to extend the logging module. Let's see some examples:

Logging JSON

If we want, we can log JSON by creating a custom formatter that transforms the log records into a JSON-encoded string:

import logging
import logging . config
import json
ATTR_TO_JSON = [ 'created' , 'filename' , 'funcName' , 'levelname' , 'lineno' , 'module' , 'msecs' , 'msg' , 'name' , 'pathname' , 'process' , 'processName' , 'relativeCreated' , 'thread' , 'threadName' ]
class JsonFormatter:
def format ( self , record ) :
obj = { attr: getattr ( record , attr )
for attr in ATTR_TO_JSON }
return json. dumps ( obj , indent = 4 )

handler = logging . StreamHandler ()
handler. formatter = JsonFormatter ()
logger = logging . getLogger ( __name__ )
logger. addHandler ( handler )
logger. error ( "Hello" )

Adding further context

On the formatters, we can specify any log record attribute.

We can inject attributes in multiple ways. In this example, we abuse filters to enrich the records.

import logging
import logging . config
GLOBAL_STUFF = 1

class ContextFilter ( logging . Filter ) :
def filter ( self , record ) :
global GLOBAL_STUFF
GLOBAL_STUFF + = 1
record. global_data = GLOBAL_STUFF
return True

handler = logging . StreamHandler ()
handler. formatter = logging . Formatter ( "%(global_data)s %(message)s" )
handler. addFilter ( ContextFilter ())
logger = logging . getLogger ( __name__ )
logger. addHandler ( handler )

logger. error ( "Hi1" )
logger. error ( "Hi2" )

This effectively adds an attribute to all the records that go through that logger. The formatter will then include it in the log line.

Note that this impacts all log records in your application, including libraries or other frameworks that you might be using and for which you are emitting logs. It can be used to log things like a unique request ID on all log lines to track requests or to add extra contextual information.

Starting with Python 3.2, you can use setLogRecordFactory to capture all log record creation and inject extra information. The extra attribute and the LoggerAdapter class may also be of the interest.

Buffering logs

Sometimes we would like to have access to debug logs when an error happens. This is feasible by creating a buffered handler that will log the last debug messages after an error occurs. See the following code as a non-curated example:

import logging
import logging . handlers

class SmartBufferHandler ( logging . handlers . MemoryHandler ) :
def __init__ ( self , num_buffered , *args , **kwargs ) :
kwargs [ "capacity" ] = num_buffered + 2 # +2 one for current, one for prepop
super () . __init__ ( *args , **kwargs )

def emit ( self , record ) :
if len ( self . buffer ) == self . capacity - 1 :
self . buffer . pop ( 0 )
super () . emit ( record )

handler = SmartBufferHandler ( num_buffered = 2 , target = logging . StreamHandler () , flushLevel = logging . ERROR )
logger = logging . getLogger ( __name__ )
logger. setLevel ( "DEBUG" )
logger. addHandler ( handler )

logger. error ( "Hello1" )
logger. debug ( "Hello2" ) # This line won't be logged
logger. debug ( "Hello3" )
logger. debug ( "Hello4" )
logger. error ( "Hello5" ) # As error will flush the buffered logs, the two last debugs will be logged

For more information

This introduction to the logging library's flexibility and configurability aims to demonstrate the beauty of how its design splits concerns. It also offers a solid foundation for anyone interested in a deeper dive into the logging documentation and the how-to guide . Although this article isn't a comprehensive guide to Python logging, here are answers to a few frequently asked questions.

My library emits a "no logger configured" warning

Check how to configure logging in a library from "The Hitchhiker's Guide to Python."

What happens if a logger has no level configured?

The effective level of the logger will then be defined recursively by its parents.

All my logs are in local time. How do I log in UTC?

Formatters are the answer! You need to set the converter attribute of your formatter to generate UTC times. Use converter = time.gmtime .

[Nov 08, 2019] The monumental impact of C Opensource.com

Nov 08, 2019 | opensource.com

The monumental impact of C The season finale of Command Line Heroes offers a lesson in how a small community of open source enthusiasts can change the world. 01 Oct 2019 Matthew Broberg (Red Hat) Feed 29 up 3 comments x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0 Command Line Heroes podcast explores C's origin story in a way that showcases the longevity and power of its design. It's a perfect synthesis of all the languages discussed throughout the podcast's third season and this series of articles . original_c_programming_book.jpg

C is such a fundamental language that many of us forget how much it has changed. Technically a "high-level language," in the sense that it requires a compiler to be runnable, it's as close to assembly language as people like to get these days (outside of specialized, low-memory environments). It's also considered to be the language that made nearly all languages that came after it possible.

The path to C began with failure

While the myth persists that all great inventions come from highly competitive garage dwellers, C's story is more fit for the Renaissance period.

In the 1960s, Bell Labs in suburban New Jersey was one of the most innovative places of its time. Jon Gertner, author of The idea factory , describes the culture of the time marked by optimism and the excitement to solve tough problems. Instead of monetization pressures with tight timelines, Bell Labs offered seemingly endless funding for wild ideas. It had a research and development ethos that aligns well with today's open leadership principles . The results were significant and prove that brilliance can come without the promise of VC funding or an IPO.

Programming and development

The challenge back then was terminal sharing: finding a way for lots of people to access the (very limited number of) available computers. Before there was a scalable answer for that, and long before we had a shell like Bash , there was the Multics project. It was a hypothetical operating system where hundreds or even thousands of developers could share time on the same system. This was a dream of John McCarty, creator of Lisp and the term artificial intelligence (AI), as I recently explored .

Joy Lisi Ranken, author of A people's history of computing in the United States , describes what happened next. There was a lot of public interest in driving forward with Multics' vision of more universally available timesharing. Academics, scientists, educators, and some in the broader public were looking forward to this computer-powered future. Many advocated for computing as a public utility, akin to electricity, and the push toward timesharing was a global movement.

Up to that point, high-end mainframes topped out at 40-50 terminals per system. The change of scale was ambitious and eventually failed, as Warren Toomey writes in IEEE Spectrum :

"Over five years, AT&T invested millions in the Multics project, purchasing a GE-645 mainframe computer and dedicating to the effort many of the top researchers at the company's renowned Bell Telephone Laboratories -- including Thompson and Ritchie, Joseph F. Ossanna, Stuart Feldman, M. Douglas McIlroy, and the late Robert Morris. But the new system was too ambitious, and it fell troublingly behind schedule. In the end, AT&T's corporate leaders decided to pull the plug."

Bell Labs pulled out of the Multics program in 1969. Multics wasn't going to happen.

The fellowship of the C

Funding wrapped up, and the powerful GE645 mainframe was assigned to other tasks inside Bell Labs. But that didn't discourage everyone.

Among the last holdouts from the Multics project were four men who felt passionately tied to the project: Ken Thompson, Dennis Ritchie, Doug McIlroy, and J.F. Ossanna. These four diehards continued to muse and scribble ideas on paper. Thompson and Ritchie developed a game called Space Travel for the PDP-7 minicomputer. While they were working on that, Thompson started implementing all those crazy hand-written ideas about filesystems they'd developed among the wreckage of Multics.

pdp7-minicomputer-oslo-2005.jpeg A PDP-7 minicomputer was not top of line technology at the time, but the team implemented foundational technologies that change the future of programming languages and operating systems alike.

That's worth emphasizing: Some of the original filesystem specifications were written by hand and then programmed on what was effectively a toy compared to the systems they were using to build Multics. Wikipedia's Ken Thompson page dives deeper into what came next:

"While writing Multics, Thompson created the Bon programming language. He also created a video game called Space Travel . Later, Bell Labs withdrew from the MULTICS project. In order to go on playing the game, Thompson found an old PDP-7 machine and rewrote Space Travel on it. Eventually, the tools developed by Thompson became the Unix operating system : Working on a PDP-7, a team of Bell Labs researchers led by Thompson and Ritchie, and including Rudd Canaday, developed a hierarchical file system , the concepts of computer processes and device files , a command-line interpreter , pipes for easy inter-process communication, and some small utility programs. In 1970, Brian Kernighan suggested the name 'Unix,' in a pun on the name 'Multics.' After initial work on Unix, Thompson decided that Unix needed a system programming language and created B , a precursor to Ritchie's C ."

As Walter Toomey documented in the IEEE Spectrum article mentioned above, Unix showed promise in a way the Multics project never materialized. After winning over the team and doing a lot more programming, the pathway to Unix was paved.

Getting from B to C in Unix

Thompson quickly created a Unix language he called B. B inherited much from its predecessor BCPL, but it wasn't enough of a breakaway from older languages. B didn't know data types, for starters. It's considered a typeless language, which meant its "Hello World" program looked like this:

main( ) {
extrn a, b, c;
putchar(a); putchar(b); putchar(c); putchar('!*n');
}

a 'hell';
b 'o, w';
c 'orld';

Even if you're not a programmer, it's clear that carving up strings four characters at a time would be limiting. It's also worth noting that this text is considered the original "Hello World" from Brian Kernighan's 1972 book, A tutorial introduction to the language B (although that claim is not definitive).

640px-unix_history-simple.svg_.png

Typelessness aside, B's assembly-language counterparts were still yielding programs faster than was possible using the B compiler's threaded-code technique. So, from 1971 to 1973, Ritchie modified B. He added a "character type" and built a new compiler so that it didn't have to use threaded code anymore. After two years of work, B had become C.

The right abstraction at the right time

C's use of types and ease of compiling down to efficient assembly code made it the perfect language for the rise of minicomputers, which speak in bytecode. B was eventually overtaken by C. Once C became the language of Unix, it became the de facto standard across the budding computer industry. Unix was the sharing platform of the pre-internet days. The more people wrote C, the better it got, and the more it was adopted. It eventually became an open standard itself. According to the Brief history of C programming language :

"For many years, the de facto standard for C was the version supplied with the Unix operating system. In the summer of 1983 a committee was established to create an ANSI (American National Standards Institute) standard that would define the C language. The standardization process took six years (much longer than anyone reasonably expected)."

How influential is C today? A quick review reveals:

Decades after they started as scrappy outsiders, Thompson and Ritchie are praised as titans of the programming world. They shared 1983's Turing Award, and in 1998, received the National Medal of Science for their work on the C language and Unix.

c-programming-language-national-medal.jpeg

But Doug McIlroy and J.F. Ossanna deserve their share of praise, too. All four of them are true Command Line Heroes.

Wrapping up the season

Command Line Heroes has completed an entire season of insights into the programming languages that affect how we code today. It's been a joy to learn about these languages and share them with you. I hope you've enjoyed it as well!

[Nov 08, 2019] A Linux user's guide to Logical Volume Management Opensource.com

Nov 08, 2019 | opensource.com

In Figure 1, two complete physical hard drives and one partition from a third hard drive have been combined into a single volume group. Two logical volumes have been created from the space in the volume group, and a filesystem, such as an EXT3 or EXT4 filesystem has been created on each of the two logical volumes.

Figure 1: LVM allows combining partitions and entire hard drives into Volume Groups.

Adding disk space to a host is fairly straightforward but, in my experience, is done relatively infrequently. The basic steps needed are listed below. You can either create an entirely new volume group or you can add the new space to an existing volume group and either expand an existing logical volume or create a new one.

Adding a new logical volume

There are times when it is necessary to add a new logical volume to a host. For example, after noticing that the directory containing virtual disks for my VirtualBox virtual machines was filling up the /home filesystem, I decided to create a new logical volume in which to store the virtual machine data, including the virtual disks. This would free up a great deal of space in my /home filesystem and also allow me to manage the disk space for the VMs independently.

The basic steps for adding a new logical volume are as follows.

  1. If necessary, install a new hard drive.
  2. Optional: Create a partition on the hard drive.
  3. Create a physical volume (PV) of the complete hard drive or a partition on the hard drive.
  4. Assign the new physical volume to an existing volume group (VG) or create a new volume group.
  5. Create a new logical volumes (LV) from the space in the volume group.
  6. Create a filesystem on the new logical volume.
  7. Add appropriate entries to /etc/fstab for mounting the filesystem.
  8. Mount the filesystem.

Now for the details. The following sequence is taken from an example I used as a lab project when teaching about Linux filesystems.

Example

This example shows how to use the CLI to extend an existing volume group to add more space to it, create a new logical volume in that space, and create a filesystem on the logical volume. This procedure can be performed on a running, mounted filesystem.

WARNING: Only the EXT3 and EXT4 filesystems can be resized on the fly on a running, mounted filesystem. Many other filesystems including BTRFS and ZFS cannot be resized.

Install hard drive

If there is not enough space in the volume group on the existing hard drive(s) in the system to add the desired amount of space it may be necessary to add a new hard drive and create the space to add to the Logical Volume. First, install the physical hard drive, and then perform the following steps.

Create Physical Volume from hard drive

It is first necessary to create a new Physical Volume (PV). Use the command below, which assumes that the new hard drive is assigned as /dev/hdd.

pvcreate /dev/hdd

It is not necessary to create a partition of any kind on the new hard drive. This creation of the Physical Volume which will be recognized by the Logical Volume Manager can be performed on a newly installed raw disk or on a Linux partition of type 83. If you are going to use the entire hard drive, creating a partition first does not offer any particular advantages and uses disk space for metadata that could otherwise be used as part of the PV.

Extend the existing Volume Group

In this example we will extend an existing volume group rather than creating a new one; you can choose to do it either way. After the Physical Volume has been created, extend the existing Volume Group (VG) to include the space on the new PV. In this example the existing Volume Group is named MyVG01.

vgextend /dev/MyVG01 /dev/hdd
Create the Logical Volume

First create the Logical Volume (LV) from existing free space within the Volume Group. The command below creates a LV with a size of 50GB. The Volume Group name is MyVG01 and the Logical Volume Name is Stuff.

lvcreate -L +50G --name Stuff MyVG01
Create the filesystem

Creating the Logical Volume does not create the filesystem. That task must be performed separately. The command below creates an EXT4 filesystem that fits the newly created Logical Volume.

mkfs -t ext4 /dev/MyVG01/Stuff
Add a filesystem label

Adding a filesystem label makes it easy to identify the filesystem later in case of a crash or other disk related problems.

e2label /dev/MyVG01/Stuff Stuff
Mount the filesystem

At this point you can create a mount point, add an appropriate entry to the /etc/fstab file, and mount the filesystem.

You should also check to verify the volume has been created correctly. You can use the df , lvs, and vgs commands to do this.

Resizing a logical volume in an LVM filesystem

The need to resize a filesystem has been around since the beginning of the first versions of Unix and has not gone away with Linux. It has gotten easier, however, with Logical Volume Management.

  1. If necessary, install a new hard drive.
  2. Optional: Create a partition on the hard drive.
  3. Create a physical volume (PV) of the complete hard drive or a partition on the hard drive.
  4. Assign the new physical volume to an existing volume group (VG) or create a new volume group.
  5. Create one or more logical volumes (LV) from the space in the volume group, or expand an existing logical volume with some or all of the new space in the volume group.
  6. If you created a new logical volume, create a filesystem on it. If adding space to an existing logical volume, use the resize2fs command to enlarge the filesystem to fill the space in the logical volume.
  7. Add appropriate entries to /etc/fstab for mounting the filesystem.
  8. Mount the filesystem.
Example

This example describes how to resize an existing Logical Volume in an LVM environment using the CLI. It adds about 50GB of space to the /Stuff filesystem. This procedure can be used on a mounted, live filesystem only with the Linux 2.6 Kernel (and higher) and EXT3 and EXT4 filesystems. I do not recommend that you do so on any critical system, but it can be done and I have done so many times; even on the root (/) filesystem. Use your judgment.

WARNING: Only the EXT3 and EXT4 filesystems can be resized on the fly on a running, mounted filesystem. Many other filesystems including BTRFS and ZFS cannot be resized.

Install the hard drive

If there is not enough space on the existing hard drive(s) in the system to add the desired amount of space it may be necessary to add a new hard drive and create the space to add to the Logical Volume. First, install the physical hard drive and then perform the following steps.

Create a Physical Volume from the hard drive

It is first necessary to create a new Physical Volume (PV). Use the command below, which assumes that the new hard drive is assigned as /dev/hdd.

pvcreate /dev/hdd

It is not necessary to create a partition of any kind on the new hard drive. This creation of the Physical Volume which will be recognized by the Logical Volume Manager can be performed on a newly installed raw disk or on a Linux partition of type 83. If you are going to use the entire hard drive, creating a partition first does not offer any particular advantages and uses disk space for metadata that could otherwise be used as part of the PV.

Add PV to existing Volume Group

For this example, we will use the new PV to extend an existing Volume Group. After the Physical Volume has been created, extend the existing Volume Group (VG) to include the space on the new PV. In this example, the existing Volume Group is named MyVG01.

vgextend /dev/MyVG01 /dev/hdd
Extend the Logical Volume

Extend the Logical Volume (LV) from existing free space within the Volume Group. The command below expands the LV by 50GB. The Volume Group name is MyVG01 and the Logical Volume Name is Stuff.

lvextend -L +50G /dev/MyVG01/Stuff
Expand the filesystem

Extending the Logical Volume will also expand the filesystem if you use the -r option. If you do not use the -r option, that task must be performed separately. The command below resizes the filesystem to fit the newly resized Logical Volume.

resize2fs /dev/MyVG01/Stuff

You should check to verify the resizing has been performed correctly. You can use the df , lvs, and vgs commands to do this.

Tips

Over the years I have learned a few things that can make logical volume management even easier than it already is. Hopefully these tips can prove of some value to you.

I know that, like me, many sysadmins have resisted the change to Logical Volume Management. I hope that this article will encourage you to at least try LVM. I am really glad that I did; my disk management tasks are much easier since I made the switch. Topics Business Linux How-tos and tutorials Sysadmin About the author David Both - David Both is an Open Source Software and GNU/Linux advocate, trainer, writer, and speaker who lives in Raleigh North Carolina. He is a strong proponent of and evangelist for the "Linux Philosophy." David has been in the IT industry for nearly 50 years. He has taught RHCE classes for Red Hat and has worked at MCI Worldcom, Cisco, and the State of North Carolina. He has been working with Linux and Open Source Software for over 20 years. David prefers to purchase the components and build his...

[Nov 08, 2019] Five open source tools for remote the OpenStack team Opensource.com

Nov 08, 2019 | opensource.com

5 tools to support distributed sysadmin teams 28 Jul 2016 Elizabeth K. Joseph Feed 304 up 1 comment Image by : Opensource.com x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0

We also add in a few more provisos:

The following five open source tools allow us to satisfy these goals while remaining a high-functioning team.

1. Text-based chat

We use Internet Relay Chat (IRC), provided to us by the freenode IRC network, which runs on an open source Internet Relay Chat daemon (IRCd). There is a vast array of open source clients available to connect to it with. Our team channel (a chat room, in IRC terms) is the soul of our team. We discuss ongoing issues and challenges, come up with solutions, inform each other of changes in progress, post project status changes and alerts, and have a bot that reports changes submitted to our infrastructure for review. The IRC channel we use is fully public, and we maintain a server that serves channel logs on a web server where anyone can view them.

The following is a quick snapshot of the channel one morning:

<clarkb> hrm no world dump on that failure?
<openstackgerrit> Anita Kuno proposed openstack-infra/storyboard: Add example commands for the Timeline api https://review.openstack.org/337854
<openstackgerrit> Victor Ryzhenkin proposed openstack-infra/project-config: Add openstack/fuel-plugin-murano-tests project https://review.openstack.org/332151
<clarkb> its definitely an io error of some sort
<clarkb> possibly run out of disk space?
<therve> The df output looks normal...
<greghaynes> or, is it writing out to tmpfs?

It takes some getting used to, but once you're familiar with the flow of the channel and working with the team, these conversations and logs become an invaluable resource for keeping up with our work. You can learn more about IRC, from commands to etiquette guidelines by reading VM (Vicky) Brasseur's pair of articles, Getting started with IRC and An IRC quickstart guide .

There are times when a quick voice call is valuable for high bandwith communcation, so we also run an Asterisk system to support Voice over IP (VoIP) calls.

Running a private IRCd within a company or other organization is common. There are a variety of open source options available; review them with project activity and security track records in mind to select one that fits your needs. If your team has tried IRC but prefers tooling that has a more modern interface and features, like logs build into the interface even when you're not directly connected, you may want to look into Mattermost , an alternative to proprietary SaaS messaging, which you can read more about in Charlie Reisinger's article on how his organization replaced IRC with Mattermost .

2. Etherpad

Etherpads are hosted collaborative text editors that allow a group to edit a document together in real time. Our team uses these for a variety of purposes: collaborating on announcements that go out to the entire project, sharing ideas, talks and topics for our in-person gatherings, writing out maintenance and upgrade plans, and working through tasks during maintenance windows.

The use of an Etherpad typically goes hand-in-hand with our collaboration on IRC and our mailing list, using the Etherpad as a place to keep track of longer-lived, asynchronous notes while we discuss things more broadly directly with each other. We use the open source Etherpad Lite in our infrastructure.

3. Pastebin

A pastebin allows you to paste a large amount of text and returns a URL that you can easily share with members of the team. For our team, that means the ability to share snippets of logs with members of our team that don't have access to the servers, share configuration files, and share instructions for performing a task that has not yet been documented. We prefer a pastebin to pasting large amounts of text in an IRC channel, or the overhead of an Etherpad for text that makes more sense to be read-only.

There are several open source options for running your own pastebin. We're currently using a version of LodgeIt that we maintain ourselves . Tip: If you're running a public pastebin, use a robots.txt file to stop search engines from indexing them. There are rarely valuable indexable data, and if the data are not indexed it will help keep the spammers off your pastebin.

4. GNU Screen

Officially known as a terminal multiplexer , GNU Screen allows you to run a command, or multiple commands, inside of a screen terminal session and keep applications running after you log out. This has been particularly valuable if we're running a rare long-lived, manually triggered command that would fail if we lose our connection, such as reindexing of our code review system, or performing certain upgrades. Many of the people on our team also use it to keep their IRC clients running 24/7.

Most interestingly, we use GNU Screen sessions for training other root members of our team on system administration tasks that we haven't yet automated or documented. Several users on a system can attach to a screen session at the same time for a collaborative terminal session. We can then walk new members of the team through access to our password vault, or share the procedures for rare or complicated maintenance tasks. It also has built-in tooling to keep a log of the entire session.

While it works for us, some critique GNU Screen for being a bit lean on modern features. Alternatives also exist like tmux and Byobu .

5. Git

Git was invented by Linus Torvalds for managing Linux kernel development. Git is now the go-to revision control system for open source projects. It may go without saying that systems administration teams should be using some kind of version control for all changes to their infrastructure, but this is particularly important for a distributed team. With our team spanning time zones, it can be hard to catch up on eight hours of discussion. With Git you can see all changes made to systems by reading commit logs, and learn who changed things.

It also allows us to more easily roll back to prior states, or at least see what prior states existed before possibly disruptive changes were made. When documenting changes more formally for team consumption, the commit history also allows us to effectively describe what and why specific changes were made.

Tip: Always make sure you include why a change was made in your commit message. We can typically figure out what was changed by reading the commit itself, but the rationale can be tricker to track down after a change is made, particularly several months later and few people remember.

Learn more about the OpenStack infrastructure, and other tools we use to support the massive development community in OpenStack running by visiting the OpenStack Project Infrastructure documentation .

[Nov 08, 2019] 10 killer tools for the admin in a hurry Opensource.com

Nov 08, 2019 | opensource.com

NixCraft
Use the site's internal search function. With more than a decade of regular updates, there's gold to be found here -- useful scripts and handy hints that can solve your problem straight away. This is often the second place I look after Google.

Webmin
This gives you a nice web interface to remotely edit your configuration files. It cuts down on a lot of time spent having to juggle directory paths and sudo nano , which is handy when you're handling several customers.

Windows Subsystem for Linux
The reality of the modern workplace is that most employees are on Windows, while the grown-up gear in the server room is on Linux. So sometimes you find yourself trying to do admin tasks from (gasp) a Windows desktop.

What do you do? Install a virtual machine? It's actually much faster and far less work to configure if you install the Windows Subsystem for Linux compatibility layer, now available at no cost on Windows 10.

This gives you a Bash terminal in a window where you can run Bash scripts and Linux binaries on the local machine, have full access to both Windows and Linux filesystems, and mount network drives. It's available in Ubuntu, OpenSUSE, SLES, Debian, and Kali flavors.

mRemoteNG
This is an excellent SSH and remote desktop client for when you have 100+ servers to manage.

Setting up a network so you don't have to do it again

A poorly planned network is the sworn enemy of the admin who hates working overtime.

IP Addressing Schemes that Scale
The diabolical thing about running out of IP addresses is that, when it happens, the network's grown large enough that a new addressing scheme is an expensive, time-consuming pain in the proverbial.

Ain't nobody got time for that!

At some point, IPv6 will finally arrive to save the day. Until then, these one-size-fits-most IP addressing schemes should keep you going, no matter how many network-connected wearables, tablets, smart locks, lights, security cameras, VoIP headsets, and espresso machines the world throws at us.

Linux Chmod Permissions Cheat Sheet
A short but sweet cheat sheet of Bash commands to set permissions across the network. This is so when Bill from Customer Service falls for that ransomware scam, you're recovering just his files and not the entire company's.

VLSM Subnet Calculator
Just put in the number of networks you want to create from an address space and the number of hosts you want per network, and it calculates what the subnet mask should be for everything.

Single-purpose Linux distributions

Need a Linux box that does just one thing? It helps if someone else has already sweated the small stuff on an operating system you can install and have ready immediately.

Each of these has, at one point, made my work day so much easier.

Porteus Kiosk
This is for when you want a computer totally locked down to just a web browser. With a little tweaking, you can even lock the browser down to just one website. This is great for public access machines. It works with touchscreens or with a keyboard and mouse.

Parted Magic
This is an operating system you can boot from a USB drive to partition hard drives, recover data, and run benchmarking tools.

IPFire
Hahahaha, I still can't believe someone called a router/firewall/proxy combo "I pee fire." That's my second favorite thing about this Linux distribution. My favorite is that it's a seriously solid software suite. It's so easy to set up and configure, and there is a heap of plugins available to extend it.

What about your top tools and cheat sheets?

So, how about you? What tools, resources, and cheat sheets have you found to make the workday easier? I'd love to know. Please share in the comments.

[Nov 08, 2019] Command-line tools for collecting system statistics Opensource.com

Nov 08, 2019 | opensource.com

Examining collected data

The output from the sar command can be detailed, or you can choose to limit the data displayed. For example, enter the sar command with no options, which displays only aggregate CPU performance data. The sar command uses the current day by default, starting at midnight, so you should only see the CPU data for today.

On the other hand, using the sar -A command shows all of the data that has been collected for today. Enter the sar -A | less command now and page through the output to view the many types of data collected by SAR, including disk and network usage, CPU context switches (how many times per second the CPU switched from one program to another), page swaps, memory and swap space usage, and much more. Use the man page for the sar command to interpret the results and to get an idea of the many options available. Many of those options allow you to view specific data, such as network and disk performance.

I typically use the sar -A command because many of the types of data available are interrelated, and sometimes I find something that gives me a clue to a performance problem in a section of the output that I might not have looked at otherwise. The -A option displays all of the collected data types.

Look at the entire output of the sar -A | less command to get a feel for the type and amount of data displayed. Be sure to look at the CPU usage data as well as the processes started per second (proc/s) and context switches per second (cswch/s). If the number of context switches increases rapidly, that can indicate that running processes are being swapped off the CPU very frequently.

You can limit the total amount of data to the total CPU activity with the sar -u command. Try that and notice that you only get the composite CPU data, not the data for the individual CPUs. Also try the -r option for memory, and -S for swap space. Combining these options so the following command will display CPU, memory, and swap space is also possible:

sar -urS

Using the -p option displays block device names for hard drives instead of the much more cryptic device identifiers, and -d displays only the block devices -- the hard drives. Issue the following command to view all of the block device data in a readable format using the names as they are found in the /dev directory:

sar -dp | less

If you want only data between certain times, you can use -s and -e to define the start and end times, respectively. The following command displays all CPU data, both individual and aggregate for the time period between 7:50 AM and 8:11 AM today:

sar -P ALL -s 07:50:00 -e 08:11:00

Note that all times must be in 24-hour format. If you have multiple CPUs, each CPU is detailed individually, and the average for all CPUs is also given.

The next command uses the -n option to display network statistics for all interfaces:

sar -n ALL | less
Data for previous days

Data collected for previous days can also be examined by specifying the desired log file. Assume that today's date is September 3 and you want to see the data for yesterday, the following command displays all collected data for September 2. The last two digits of each file are the day of the month on which the data was collected:

sar -A -f /var/log/sa/sa02 | less

You can use the command below, where DD is the day of the month for yesterday:

sar -A -f /var/log/sa/saDD | less
Realtime data

You can also use SAR to display (nearly) realtime data. The following command displays memory usage in 5- second intervals for 10 iterations:

sar -r 5 10

This is an interesting option for sar as it can provide a series of data points for a defined period of time that can be examined in detail and compared. The /proc filesystem All of this data for SAR and the system monitoring tools covered in my previous article must come from somewhere. Fortunately, all of that kernel data is easily available in the /proc filesystem. In fact, because the kernel performance data stored there is all in ASCII text format, it can be displayed using simple commands like cat so that the individual programs do not have to load their own kernel modules to collect it. This saves system resources and makes the data more accurate. SAR and the system monitoring tools I have discussed in my previous article all collect their data from the /proc filesystem.

Note that /proc is a virtual filesystem and only exists in RAM while Linux is running. It is not stored on the hard drive.

Even though I won't get into detail, the /proc filesystem also contains the live kernel tuning parameters and variables. Thus you can change the kernel tuning by simply changing the appropriate kernel tuning variable in /proc; no reboot is required.

Change to the /proc directory and list the files there.You will see, in addition to the data files, a large quantity of numbered directories. Each of these directories represents a process where the directory name is the Process ID (PID). You can delve into those directories to locate information about individual processes that might be of interest.

To view this data, simply cat some of the following files:

You will see that, although the data is available in these files, much of it is not annotated in any way. That means you will have work to do to identify and extract the desired data. However, the monitoring tools already discussed already do that for the data they are designed to display.

There is so much more data in the /proc filesystem that the best way to learn more about it is to refer to the proc(5) man page, which contains detailed information about the various files found there.

Next time I will pull all this together and discuss how I have used these tools to solve problems.

David Both - David Both is an Open Source Software and GNU/Linux advocate, trainer, writer, and speaker who lives in Raleigh North Carolina. He is a strong proponent of and evangelist for the "Linux Philosophy." David has been in the IT industry for nearly 50 years. He has taught RHCE classes for Red Hat and has worked at MCI Worldcom, Cisco, and the State of North Carolina. He has been working with Linux and Open Source Software for over 20 years.

[Nov 08, 2019] How to use Sanoid to recover from data disasters Opensource.com

Nov 08, 2019 | opensource.com

filesystem-level snapshot replication to move data from one machine to another, fast . For enormous blobs like virtual machine images, we're talking several orders of magnitude faster than rsync .

If that isn't cool enough already, you don't even necessarily need to restore from backup if you lost the production hardware; you can just boot up the VM directly on the local hotspare hardware, or the remote disaster recovery hardware, as appropriate. So even in case of catastrophic hardware failure , you're still looking at that 59m RPO, <1m RTO.

https://www.youtube.com/embed/5hEixXutaPo

Backups -- and recoveries -- don't get much easier than this.

The syntax is dead simple:

root@box1:~# syncoid pool/images/vmname root@box2:pooln
ame/images/vmname

Or if you have lots of VMs, like I usually do... recursion!

root@box1:~# syncoid -r pool/images/vmname root@box2:po
olname/images/vmname

This makes it not only possible, but easy to replicate multiple-terabyte VM images hourly over a local network, and daily over a VPN. We're not talking enterprise 100mbps symmetrical fiber, either. Most of my clients have 5mbps or less available for upload, which doesn't keep them from automated, nightly over-the-air backups, usually to a machine sitting quietly in an owner's house.

Preventing your own Humpty Level Events

Sanoid is open source software, and so are all its dependencies. You can run Sanoid and Syncoid themselves on pretty much anything with ZFS. I developed it and use it on Linux myself, but people are using it (and I support it) on OpenIndiana, FreeBSD, and FreeNAS too.

You can find the GPLv3 licensed code on the website (which actually just redirects to Sanoid's GitHub project page), and there's also a Chef Cookbook and an Arch AUR repo available from third parties.

[Nov 08, 2019] Introduction to the Linux chgrp and newgrp commands by Alan Formy-Duval

Sep 23, 2019 | opensource.com
The chgrp and newgrp commands help you manage files that need to maintain group ownership. In a recent article, I introduced the chown command , which is used for modifying ownership of files on systems. Recall that ownership is the combination of the user and group assigned to an object. The chgrp and newgrp commands provide additional help for managing files that need to maintain group ownership. Using chgrp

The chgrp command simply changes the group ownership of a file. It is the same as the chown :<group> command. You can use:

$chown :alan mynotes

or:

$chgrp alan mynotes
Recursive

A few additional arguments to chgrp can be useful at both the command line and in a script. Just like many other Linux commands, chgrp has a recursive argument, -R . You will need this to operate on a directory and its contents recursively, as I'll demonstrate below. I added the -v ( verbose ) argument so chgrp tells me what it is doing:

$ ls -l . conf
.:
drwxrwxr-x 2 alan alan 4096 Aug 5 15 : 33 conf

conf:
-rw-rw-r-- 1 alan alan 0 Aug 5 15 : 33 conf.xml
# chgrp -vR delta conf
changed group of 'conf/conf.xml' from alan to delta
changed group of 'conf' from alan to delta

Reference

More Linux resources

A reference file ( --reference=RFILE ) can be used when changing the group on files to match a certain configuration or when you don't know the group, as might be the case when running a script. You can duplicate another file's group ( RFILE ), referred to as a reference file. For example, to undo the changes made above (recall that a dot [ . ] refers to the present working directory):
$ chgrp -vR --reference=. conf
Report changes

Most commands have arguments for controlling their output. The most common is -v to enable verbose, and the chgrp command has a verbose mode. It also has a -c ( --changes ) argument, which instructs chgrp to report only when a change is made. Chgrp will still report other things, such as if an operation is not permitted.

The argument -f ( --silent , --quiet ) is used to suppress most error messages. I will use this argument and -c in the next section so it will show only actual changes.

Preserve root

The root ( / ) of the Linux filesystem should be treated with great respect. If a command mistake is made at this level, the consequences can be terrible and leave a system completely useless. Particularly when you are running a recursive command that will make any kind of change -- or worse, deletions. The chgrp command has an argument that can be used to protect and preserve the root. The argument is --preserve-root . If this argument is used with a recursive chgrp command on the root, nothing will happen and a message will appear instead:

[ root @ localhost / ] # chgrp -cfR --preserve-root a+w /
chgrp: it is dangerous to operate recursively on '/'
chgrp: use --no-preserve-root to override this failsafe

The option has no effect when it's not used in conjunction with recursive. However, if the command is run by the root user, the permissions of / will change, but not those of other files or directories within it:

[ alan @ localhost / ] $ chgrp -c --preserve-root alan /
chgrp: changing group of '/' : Operation not permitted
[ root @ localhost / ] # chgrp -c --preserve-root alan /
changed group of '/' from root to alan

Surprisingly, it seems, this is not the default argument. The option --no-preserve-root is the default. If you run the command above without the "preserve" option, it will default to "no preserve" mode and possibly change permissions on files that shouldn't be changed:

[ alan @ localhost / ] $ chgrp -cfR alan /
changed group of '/dev/pts/0' from tty to alan
changed group of '/dev/tty2' from tty to alan
changed group of '/var/spool/mail/alan' from mail to alan About newgrp

The newgrp command allows a user to override the current primary group. newgrp can be handy when you are working in a directory where all files must have the same group ownership. Suppose you have a directory called share on your intranet server where different teams store marketing photos. The group is share . As different users place files into the directory, the files' primary groups might become mixed up. Whenever new files are added, you can run chgrp to correct any mix-ups by setting the group to share :

$ cd share
ls -l
-rw-r--r--. 1 alan share 0 Aug 7 15 : 35 pic13
-rw-r--r--. 1 alan alan 0 Aug 7 15 : 35 pic1
-rw-r--r--. 1 susan delta 0 Aug 7 15 : 35 pic2
-rw-r--r--. 1 james gamma 0 Aug 7 15 : 35 pic3
-rw-rw-r--. 1 bill contract 0 Aug 7 15 : 36 pic4

I covered setgid mode in my article on the chmod command . This would be one way to solve this problem. But, suppose the setgid bit was not set for some reason. The newgrp command is useful in this situation. Before any users put files into the share directory, they can run the command newgrp share . This switches their primary group to share so all files they put into the directory will automatically have the group share , rather than the user's primary group. Once they are finished, users can switch back to their regular primary group with (for example):

newgrp alan
Conclusion

It is important to understand how to manage users, groups, and permissions. It is also good to know a few alternative ways to work around problems you might encounter since not all environments are set up the same way.

[Nov 07, 2019] Is BDFL a death sentence Opensource.com

Nov 07, 2019 | opensource.com

What happens when a Benevolent Dictator For Life moves on from an open source project? 16 Jul 2018 Jason Baker (Red Hat) Feed 131 up 2 comments Image credits : Original photo by Gabriel Kamener, Sown Together, Modified by Jen Wike Huger x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0 Guido van Rossum , creator of the Python programming language and Benevolent Dictator For Life (BDFL) of the project, announced his intention to step away.

Below is a portion of his message, although the entire email is not terribly long and worth taking the time to read if you're interested in the circumstances leading to van Rossum's departure.

I would like to remove myself entirely from the decision process. I'll still be there for a while as an ordinary core dev, and I'll still be available to mentor people -- possibly more available. But I'm basically giving myself a permanent vacation from being BDFL, and you all will be on your own.

After all that's eventually going to happen regardless -- there's still that bus lurking around the corner, and I'm not getting younger... (I'll spare you the list of medical issues.)

I am not going to appoint a successor.

So what are you all going to do? Create a democracy? Anarchy? A dictatorship? A federation?

It's worth zooming out for a moment to consider the issue at a larger scale. How an open source project is governed can have very real consequences on the long-term sustainability of its user and developer communities alike.

BDFLs tend to emerge from passion projects, where a single individual takes on a project before growing a community around it. Projects emerging from companies or other large organization often lack this role, as the distribution of authority is more formalized, or at least more dispersed, from the start. Even then, it's not uncommon to need to figure out how to transition from one form of project governance to another as the community grows and expands.

More Python Resources

But regardless of how an open source project is structured, ultimately, there needs to be some mechanism for deciding how to make technical decisions. Someone, or some group, has to decide which commits to accept, which to reject, and more broadly what direction the project is going to take from a technical perspective.

Surely the Python project will be okay without van Rossum. The Python Software Foundation has plenty of formalized structure in place bringing in broad representation from across the community. There's even been a humorous April Fools Python Enhancement Proposal (PEP) addressing the BDFL's retirement in the past.

That said, it's interesting that van Rossum did not heed the fifth lesson of Eric S. Raymond from his essay, The Mail Must Get Through (part of The Cathedral & the Bazaar ) , which stipulates: "When you lose interest in a program, your last duty to it is to hand it off to a competent successor." One could certainly argue that letting the community pick its own leadership, though, is an equally-valid choice.

What do you think? Are projects better or worse for being run by a BDFL? What can we expect when a BDFL moves on? And can someone truly step away from their passion project after decades of leading it? Will we still turn to them for the hard decisions, or can a community smoothly transition to new leadership without the pitfalls of forks or lost participants?

Can you truly stop being a BDFL? Or is it a title you'll hold, at least informally, until your death? Topics Community management Python 2018 Open Source Yearbook Yearbook About the author Jason Baker - I use technology to make the world more open. Linux desktop enthusiast. Map/geospatial nerd. Raspberry Pi tinkerer. Data analysis and visualization geek. Occasional coder. Cloud nativist. Civic tech and open government booster. More about me

Recommended reading
Conquering documentation challenges on a massive project

4 Python tools for getting started with astronomy

5 reasons why I love Python

Building trust in the Linux community

Pylint: Making your Python code consistent

Perceiving Python programming paradigms
2 Comments

Mike James on 17 Jul 2018 Permalink

My take on the issue:
https://www.i-programmer.info/news/216-python/11967-guido-van-rossum-qui...

Maxim Stewart on 05 Aug 2018 Permalink

"So what are you all going to do? Create a democracy? Anarchy? A dictatorship? A federation?"

Power coalesced to one point is always scary when thought about in the context of succession. A vacuum invites anarchy and I often think about this for when Linus Torvalds leaves the picture. We really have no concrete answers for what is the best way forward but my hope is towards a democratic process. But, as current history indicates, a democracy untended by its citizens invites quite the nightmare and so too does this translate to the keeping up of a project.

[Nov 07, 2019] Manage NTP with Chrony Opensource.com

Nov 07, 2019 | opensource.com

Manage NTP with Chrony Chronyd is a better choice for most networks than ntpd for keeping computers synchronized with the Network Time Protocol. 03 Dec 2018 David Both (Community Moderator) Feed 100 up 1 comment Image by : Matteo Ianeselli. Modified by Opensource.com. CC-BY-3.0. x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0

"Does anybody really know what time it is? Does anybody really care?"
Chicago , 1969

Perhaps that rock group didn't care what time it was, but our computers do need to know the exact time. Timekeeping is very important to computer networks. In banking, stock markets, and other financial businesses, transactions must be maintained in the proper order, and exact time sequences are critical for that. For sysadmins and DevOps professionals, it's easier to follow the trail of email through a series of servers or to determine the exact sequence of events using log files on geographically dispersed hosts when exact times are kept on the computers in question.

I used to work at an organization that received over 20 million emails per day and had four servers just to accept and do a basic filter on the incoming flood of email. From there, emails were sent to one of four other servers to perform more complex anti-spam assessments, then they were delivered to one of several additional servers where the emails were placed in the correct inboxes. At each layer, the emails would be sent to one of the next-level servers, selected only by the randomness of round-robin DNS. Sometimes we had to trace a new message through the system until we could determine where it "got lost," according to the pointy-haired bosses. We had to do this with frightening regularity.

Most of that email turned out to be spam. Some people actually complained that their [joke, cat pic, recipe, inspirational saying, or other-strange-email]-of-the-day was missing and asked us to find it. We did reject those opportunities.

Our email and other transactional searches were aided by log entries with timestamps that -- today -- can resolve down to the nanosecond in even the slowest of modern Linux computers. In very high-volume transaction environments, even a few microseconds of difference in the system clocks can mean sorting thousands of transactions to find the correct one(s).

The NTP server hierarchy

Computers worldwide use the Network Time Protocol (NTP) to synchronize their times with internet standard reference clocks via a hierarchy of NTP servers. The primary servers are at stratum 1, and they are connected directly to various national time services at stratum 0 via satellite, radio, or even modems over phone lines. The time service at stratum 0 may be an atomic clock, a radio receiver tuned to the signals broadcast by an atomic clock, or a GPS receiver using the highly accurate clock signals broadcast by GPS satellites.

To prevent time requests from time servers lower in the hierarchy (i.e., with a higher stratum number) from overwhelming the primary reference servers, there are several thousand public NTP stratum 2 servers that are open and available for anyone to use. Many organizations with large numbers of hosts that need an NTP server will set up their own time servers so that only one local host accesses the stratum 2 time servers, then they configure the remaining network hosts to use the local time server which, in my case, is a stratum 3 server.

NTP choices

The original NTP daemon, ntpd , has been joined by a newer one, chronyd . Both keep the local host's time synchronized with the time server. Both services are available, and I have seen nothing to indicate that this will change anytime soon.

Chrony has features that make it the better choice for most environments for the following reasons:

The NTP and Chrony RPM packages are available from standard Fedora repositories. You can install both and switch between them, but modern Fedora, CentOS, and RHEL releases have moved from NTP to Chrony as their default time-keeping implementation. I have found that Chrony works well, provides a better interface for the sysadmin, presents much more information, and increases control.

Just to make it clear, NTP is a protocol that is implemented with either NTP or Chrony. If you'd like to know more, read this comparison between NTP and Chrony as implementations of the NTP protocol.

This article explains how to configure Chrony clients and servers on a Fedora host, but the configuration for CentOS and RHEL current releases works the same.

Chrony structure

The Chrony daemon, chronyd , runs in the background and monitors the time and status of the time server specified in the chrony.conf file. If the local time needs to be adjusted, chronyd does it smoothly without the programmatic trauma that would occur if the clock were instantly reset to a new time.

Chrony's chronyc tool allows someone to monitor the current status of Chrony and make changes if necessary. The chronyc utility can be used as a command that accepts subcommands, or it can be used as an interactive text-mode program. This article will explain both uses.

Client configuration

The NTP client configuration is simple and requires little or no intervention. The NTP server can be defined during the Linux installation or provided by the DHCP server at boot time. The default /etc/chrony.conf file (shown below in its entirety) requires no intervention to work properly as a client. For Fedora, Chrony uses the Fedora NTP pool, and CentOS and RHEL have their own NTP server pools. Like many Red Hat-based distributions, the configuration file is well commented.

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
pool 2.fedora.pool.ntp.org iburst

# Record the rate at which the system clock gains/losses time.
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 second.
makestep 1.0 3

# Enable kernel synchronization of the real-time clock (RTC).

# Enable hardware timestamping on all interfaces that support it.
#hwtimestamp *

# Increase the minimum number of selectable sources required to adjust
# the system clock.
#minsources 2

# Allow NTP client access from local network.
#allow 192.168.0.0/16

# Serve time even if not synchronized to a time source.
#local stratum 10

# Specify file containing keys for NTP authentication.
keyfile /etc/chrony.keys

# Get TAI-UTC offset and leap seconds from the system tz database.
leapsectz right/UTC

# Specify directory for log files.
logdir /var/log/chrony

# Select which information is logged.
#log measurements statistics tracking

Let's look at the current status of NTP on a virtual machine I use for testing. The chronyc command, when used with the tracking subcommand, provides statistics that report how far off the local system is from the reference server.

[root@studentvm1 ~]# chronyc tracking
Reference ID : 23ABED4D (ec2-35-171-237-77.compute-1.amazonaws.com)
Stratum : 3
Ref time (UTC) : Fri Nov 16 16:21:30 2018
System time : 0.000645622 seconds slow of NTP time
Last offset : -0.000308577 seconds
RMS offset : 0.000786140 seconds
Frequency : 0.147 ppm slow
Residual freq : -0.073 ppm
Skew : 0.062 ppm
Root delay : 0.041452706 seconds
Root dispersion : 0.022665167 seconds
Update interval : 1044.2 seconds
Leap status : Normal
[root@studentvm1 ~]#

The Reference ID in the first line of the result is the server the host is synchronized to -- in this case, a stratum 3 reference server that was last contacted by the host at 16:21:30 2018. The other lines are described in the chronyc(1) man page .

The sources subcommand is also useful because it provides information about the time source configured in chrony.conf .

[root@studentvm1 ~]# chronyc sources
210 Number of sources = 5
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^+ 192.168.0.51 3 6 377 0 -2613us[-2613us] +/- 63ms
^+ dev.smatwebdesign.com 3 10 377 28m -2961us[-3534us] +/- 113ms
^+ propjet.latt.net 2 10 377 465 -1097us[-1085us] +/- 77ms
^* ec2-35-171-237-77.comput> 2 10 377 83 +2388us[+2395us] +/- 95ms
^+ PBX.cytranet.net 3 10 377 507 -1602us[-1589us] +/- 96ms
[root@studentvm1 ~]#

The first source in the list is the time server I set up for my personal network. The others were provided by the pool. Even though my NTP server doesn't appear in the Chrony configuration file above, my DHCP server provides its IP address for the NTP server. The "S" column -- Source State -- indicates with an asterisk ( * ) the server our host is synced to. This is consistent with the data from the tracking subcommand.

The -v option provides a nice description of the fields in this output.

[root@studentvm1 ~]# chronyc sources -v
210 Number of sources = 5

.-- Source mode '^' = server, '=' = peer, '#' = local clock.
/ .- Source state '*' = current synced, '+' = combined , '-' = not combined,
| / '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
|| .- xxxx [ yyyy ] +/- zzzz
|| Reachability register (octal) -. | xxxx = adjusted offset,
|| Log2(Polling interval) --. | | yyyy = measured offset,
|| \ | | zzzz = estimated error.
|| | | \
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^+ 192.168.0.51 3 7 377 28 -2156us[-2156us] +/- 63ms
^+ triton.ellipse.net 2 10 377 24 +5716us[+5716us] +/- 62ms
^+ lithium.constant.com 2 10 377 351 -820us[ -820us] +/- 64ms
^* t2.time.bf1.yahoo.com 2 10 377 453 -992us[ -965us] +/- 46ms
^- ntp.idealab.com 2 10 377 799 +3653us[+3674us] +/- 87ms
[root@studentvm1 ~]#

If I wanted my server to be the preferred reference time source for this host, I would add the line below to the /etc/chrony.conf file.

server 192.168.0.51 iburst prefer

I usually place this line just above the first pool server statement near the top of the file. There is no special reason for this, except I like to keep the server statements together. It would work just as well at the bottom of the file, and I have done that on several hosts. This configuration file is not sequence-sensitive.

The prefer option marks this as the preferred reference source. As such, this host will always be synchronized with this reference source (as long as it is available). We can also use the fully qualified hostname for a remote reference server or the hostname only (without the domain name) for a local reference time source as long as the search statement is set in the /etc/resolv.conf file. I prefer the IP address to ensure that the time source is accessible even if DNS is not working. In most environments, the server name is probably the better option, because NTP will continue to work even if the server's IP address changes.

If you don't have a specific reference source you want to synchronize to, it is fine to use the defaults.

Configuring an NTP server with Chrony

The nice thing about the Chrony configuration file is that this single file configures the host as both a client and a server. To add a server function to our host -- it will always be a client, obtaining its time from a reference server -- we just need to make a couple of changes to the Chrony configuration, then configure the host's firewall to accept NTP requests.

Open the /etc/chrony.conf file in your favorite text editor and uncomment the local stratum 10 line. This enables the Chrony NTP server to continue to act as if it were connected to a remote reference server if the internet connection fails; this enables the host to continue to be an NTP server to other hosts on the local network.

Let's restart chronyd and track how the service is working for a few minutes. Before we enable our host as an NTP server, we want to test a bit.

[root@studentvm1 ~]# systemctl restart chronyd ; watch chronyc tracking

The results should look like this. The watch command runs the chronyc tracking command every two seconds so we can watch changes occur over time.

Every 2.0s: chronyc tracking studentvm1: Fri Nov 16 20:59:31 2018

Reference ID : C0A80033 (192.168.0.51)
Stratum : 4
Ref time (UTC) : Sat Nov 17 01:58:51 2018
System time : 0.001598277 seconds fast of NTP time
Last offset : +0.001791533 seconds
RMS offset : 0.001791533 seconds
Frequency : 0.546 ppm slow
Residual freq : -0.175 ppm
Skew : 0.168 ppm
Root delay : 0.094823152 seconds
Root dispersion : 0.021242738 seconds
Update interval : 65.0 seconds
Leap status : Normal

Notice that my NTP server, the studentvm1 host, synchronizes to the host at 192.168.0.51, which is my internal network NTP server, at stratum 4. Synchronizing directly to the Fedora pool machines would result in synchronization at stratum 3. Notice also that the amount of error decreases over time. Eventually, it should stabilize with a tiny variation around a fairly small range of error. The size of the error depends upon the stratum and other network factors. After a few minutes, use Ctrl+C to break out of the watch loop.

To turn our host into an NTP server, we need to allow it to listen on the local network. Uncomment the following line to allow hosts on the local network to access our NTP server.

# Allow NTP client access from local network.
allow 192.168.0.0/16

Note that the server can listen for requests on any local network it's attached to. The IP address in the "allow" line is just intended for illustrative purposes. Be sure to change the IP network and subnet mask in that line to match your local network's.

Restart chronyd .

[root@studentvm1 ~]# systemctl restart chronyd

To allow other hosts on your network to access this server, configure the firewall to allow inbound UDP packets on port 123. Check your firewall's documentation to find out how to do that.

Testing

Your host is now an NTP server. You can test it with another host or a VM that has access to the network on which the NTP server is listening. Configure the client to use the new NTP server as the preferred server in the /etc/chrony.conf file, then monitor that client using the chronyc tools we used above.

Chronyc as an interactive tool

As I mentioned earlier, chronyc can be used as an interactive command tool. Simply run the command without a subcommand and you get a chronyc command prompt.

[root@studentvm1 ~]# chronyc
chrony version 3.4
Copyright (C) 1997-2003, 2007, 2009-2018 Richard P. Curnow and others
chrony comes with ABSOLUTELY NO WARRANTY. This is free software, and
you are welcome to redistribute it under certain conditions. See the
GNU General Public License version 2 for details.

chronyc>

You can enter just the subcommands at this prompt. Try using the tracking , ntpdata , and sources commands. The chronyc command line allows command recall and editing for chronyc subcommands. You can use the help subcommand to get a list of possible commands and their syntax.

Conclusion

Chrony is a powerful tool for synchronizing the times of client hosts, whether they are all on the local network or scattered around the globe. It's easy to configure because, despite the large number of options available, only a few configurations are required for most circumstances.

After my client computers have synchronized with the NTP server, I like to set the system hardware clock from the system (OS) time by using the following command:

/sbin/hwclock --systohc

This command can be added as a cron job or a script in cron.daily to keep the hardware clock synced with the system time.

Chrony and NTP (the service) both use the same configuration, and the files' contents are interchangeable. The man pages for chronyd , chronyc , and chrony.conf contain an amazing amount of information that can help you get started or learn about esoteric configuration options.

Do you run your own NTP server? Let us know in the comments and be sure to tell us which implementation you are using, NTP or Chrony.

[Nov 07, 2019] 5 alerting and visualization tools for sysadmins Opensource.com

Nov 07, 2019 | opensource.com

Common types of alerts and visualizations Alerts

Let's first cover what alerts are not . Alerts should not be sent if the human responder can't do anything about the problem. This includes alerts that are sent to multiple individuals with only a few who can respond, or situations where every anomaly in the system triggers an alert. This leads to alert fatigue and receivers ignoring all alerts within a specific medium until the system escalates to a medium that isn't already saturated.

For example, if an operator receives hundreds of emails a day from the alerting system, that operator will soon ignore all emails from the alerting system. The operator will respond to a real incident only when he or she is experiencing the problem, emailed by a customer, or called by the boss. In this case, alerts have lost their meaning and usefulness.

Alerts are not a constant stream of information or a status update. They are meant to convey a problem from which the system can't automatically recover, and they are sent only to the individual most likely to be able to recover the system. Everything that falls outside this definition isn't an alert and will only damage your employees and company culture.

Everyone has a different set of alert types, so I won't discuss things like priority levels (P1-P5) or models that use words like "Informational," "Warning," and "Critical." Instead, I'll describe the generic categories emergent in complex systems' incident response.

You might have noticed I mentioned an "Informational" alert type right after I wrote that alerts shouldn't be informational. Well, not everyone agrees, but I don't consider something an alert if it isn't sent to anyone. It is a data point that many systems refer to as an alert. It represents some event that should be known but not responded to. It is generally part of the visualization system of the alerting tool and not an event that triggers actual notifications. Mike Julian covers this and other aspects of alerting in his book Practical Monitoring . It's a must read for work in this area.

Non-informational alerts consist of types that can be responded to or require action. I group these into two categories: internal outage and external outage. (Most companies have more than two levels for prioritizing their response efforts.) Degraded system performance is considered an outage in this model, as the impact to each user is usually unknown.

Internal outages are a lower priority than external outages, but they still need to be responded to quickly. They often include internal systems that company employees use or components of applications that are visible only to company employees.

External outages consist of any system outage that would immediately impact a customer. These don't include a system outage that prevents releasing updates to the system. They do include customer-facing application failures, database outages, and networking partitions that hurt availability or consistency if either can impact a user. They also include outages of tools that may not have a direct impact on users, as the application continues to run but this transparent dependency impacts performance. This is common when the system uses some external service or data source that isn't necessary for full functionality but may cause delays as the application performs retries or handles errors from this external dependency.

Visualizations

There are many visualization types, and I won't cover them all here. It's a fascinating area of research. On the data analytics side of my career, learning and applying that knowledge is a constant challenge. We need to provide simple representations of complex system outputs for the widest dissemination of information. Google Charts and Tableau have a wide selection of visualization types. We'll cover the most common visualizations and some innovative solutions for quickly understanding systems.

Line chart

The line chart is probably the most common visualization. It does a pretty good job of producing an understanding of a system over time. A line chart in a metrics system would have a line for each unique metric or some aggregation of metrics. This can get confusing when there are a lot of metrics in the same dashboard (as shown below), but most systems can select specific metrics to view rather than having all of them visible. Also, anomalous behavior is easy to spot if it's significant enough to escape the noise of normal operations. Below we can see purple, yellow, and light blue lines that might indicate anomalous behavior.

monitoring_guide_line_chart.png

Another feature of a line chart is that you can often stack them to show relationships. For example, you might want to look at requests on each server individually, but also in aggregate. This allows you to understand the overall system as well as each instance in the same graph.

monitoring_guide_line_chart_aggregate.png Heatmaps

Another common visualization is the heatmap. It is useful when looking at histograms. This type of visualization is similar to a bar chart but can show gradients within the bars representing the different percentiles of the overall metric. For example, suppose you're looking at request latencies and you want to quickly understand the overall trend as well as the distribution of all requests. A heatmap is great for this, and it can use color to disambiguate the quantity of each section with a quick glance.

The heatmap below shows the higher concentration around the centerline of the graph with an easy-to-understand visualization of the distribution vertically for each time bucket. We might want to review a couple of points in time where the distribution gets wide while the others are fairly tight like at 14:00. This distribution might be a negative performance indicator.

monitoring_guide_histogram.png Gauges

The last common visualization I'll cover here is the gauge, which helps users understand a single metric quickly. Gauges can represent a single metric, like your speedometer represents your driving speed or your gas gauge represents the amount of gas in your car. Similar to the gas gauge, most monitoring gauges clearly indicate what is good and what isn't. Often (as is shown below), good is represented by green, getting worse by orange, and "everything is breaking" by red. The middle row below shows traditional gauges.

monitoring_guide_gauges.png Image source: Grafana.org (© Grafana Labs)

This image shows more than just traditional gauges. The other gauges are single stat representations that are similar to the function of the classic gauge. They all use the same color scheme to quickly indicate system health with just a glance. Arguably, the bottom row is probably the best example of a gauge that allows you to glance at a dashboard and know that everything is healthy (or not). This type of visualization is usually what I put on a top-level dashboard. It offers a full, high-level understanding of system health in seconds.

Flame graphs

A less common visualization is the flame graph, introduced by Netflix's Brendan Gregg in 2011. It's not ideal for dashboarding or quickly observing high-level system concerns; it's normally seen when trying to understand a specific application problem. This visualization focuses on CPU and memory and the associated frames. The X-axis lists the frames alphabetically, and the Y-axis shows stack depth. Each rectangle is a stack frame and includes the function being called. The wider the rectangle, the more it appears in the stack. This method is invaluable when trying to diagnose system performance at the application level and I urge everyone to give it a try.

monitoring_guide_flame_graph.png Image source: Wikimedia.org ( Creative Commons BY SA 3.0 ) Tool options

There are several commercial options for alerting, but since this is Opensource.com, I'll cover only systems that are being used at scale by real companies that you can use at no cost. Hopefully, you'll be able to contribute new and innovative features to make these systems even better.

Alerting tools Bosun

If you've ever done anything with computers and gotten stuck, the help you received was probably thanks to a Stack Exchange system. Stack Exchange runs many different websites around a crowdsourced question-and-answer model. Stack Overflow is very popular with developers, and Super User is popular with operations. However, there are now hundreds of sites ranging from parenting to sci-fi and philosophy to bicycles.

Stack Exchange open-sourced its alert management system, Bosun , around the same time Prometheus and its AlertManager system were released. There were many similarities in the two systems, and that's a really good thing. Like Prometheus, Bosun is written in Golang. Bosun's scope is more extensive than Prometheus' as it can interact with systems beyond metrics aggregation. It can also ingest data from log and event aggregation systems. It supports Graphite, InfluxDB, OpenTSDB, and Elasticsearch.

Bosun's architecture consists of a single server binary, a backend like OpenTSDB, Redis, and scollector agents . The scollector agents automatically detect services on a host and report metrics for those processes and other system resources. This data is sent to a metrics backend. The Bosun server binary then queries the backends to determine if any alerts need to be fired. Bosun can also be used by tools like Grafana to query the underlying backends through one common interface. Redis is used to store state and metadata for Bosun.

A really neat feature of Bosun is that it lets you test your alerts against historical data. This was something I missed in Prometheus several years ago, when I had data for an issue I wanted alerts on but no easy way to test it. To make sure my alerts were working, I had to create and insert dummy data. This system alleviates that very time-consuming process.

Bosun also has the usual features like showing simple graphs and creating alerts. It has a powerful expression language for writing alerting rules. However, it only has email and HTTP notification configurations, which means connecting to Slack and other tools requires a bit more customization ( which its documentation covers ). Similar to Prometheus, Bosun can use templates for these notifications, which means they can look as awesome as you want them to. You can use all your HTML and CSS skills to create the baddest email alert anyone has ever seen.

Cabot

Cabot was created by a company called Arachnys . You may not know who Arachnys is or what it does, but you have probably felt its impact: It built the leading cloud-based solution for fighting financial crimes. That sounds pretty cool, right? At a previous company, I was involved in similar functions around "know your customer" laws. Most companies would consider it a very bad thing to be linked to a terrorist group, for example, funneling money through their systems. These solutions also help defend against less-atrocious offenders like fraudsters who could also pose a risk to the institution.

So why did Arachnys create Cabot? Well, it is kind of a Christmas present to everyone, as it was a Christmas project built because its developers couldn't wrap their heads around Nagios . And really, who can blame them? Cabot was written with Django and Bootstrap, so it should be easy for most to contribute to the project. (Another interesting factoid: The name comes from the creator's dog.)

The Cabot architecture is similar to Bosun in that it doesn't collect any data. Instead, it accesses data through the APIs of the tools it is alerting for. Therefore, Cabot uses a pull (rather than a push) model for alerting. It reaches out into each system's API and retrieves the information it needs to make a decision based on a specific check. Cabot stores the alerting data in a Postgres database and also has a cache using Redis.

Cabot natively supports Graphite , but it also supports Jenkins , which is rare in this area. Arachnys uses Jenkins like a centralized cron, but I like this idea of treating build failures like outages. Obviously, a build failure isn't as critical as a production outage, but it could still alert the team and escalate if the failure isn't resolved. Who actually checks Jenkins every time an email comes in about a build failure? Yeah, me too!

Another interesting feature is that Cabot can integrate with Google Calendar for on-call rotations. Cabot calls this feature Rota, which is a British term for a roster or rotation. This makes a lot of sense, and I wish other systems would take this idea further. Cabot doesn't support anything more complex than primary and backup personnel, but there is certainly room for additional features. The docs say if you want something more advanced, you should look at a commercial option.

StatsAgg

StatsAgg ? How did that make the list? Well, it's not every day you come across a publishing company that has created an alerting platform. I think that deserves recognition. Of course, Pearson isn't just a publishing company anymore; it has several web presences and a joint venture with O'Reilly Media . However, I still think of it as the company that published my schoolbooks and tests.

StatsAgg isn't just an alerting platform; it's also a metrics aggregation platform. And it's kind of like a proxy for other systems. It supports Graphite, StatsD, InfluxDB, and OpenTSDB as inputs, but it can also forward those metrics to their respective platforms. This is an interesting concept, but potentially risky as loads increase on a central service. However, if the StatsAgg infrastructure is robust enough, it can still produce alerts even when a backend storage platform has an outage.

StatsAgg is written in Java and consists only of the main server and UI, which keeps complexity to a minimum. It can send alerts based on regular expression matching and is focused on alerting by service rather than host or instance. Its goal is to fill a void in the open source observability stack, and I think it does that quite well.

Visualization tools Grafana

Almost everyone knows about Grafana , and many have used it. I have used it for years whenever I need a simple dashboard. The tool I used before was deprecated, and I was fairly distraught about that until Grafana made it okay. Grafana was gifted to us by Torkel Ödegaard. Like Cabot, Grafana was also created around Christmastime, and released in January 2014. It has come a long way in just a few years. It started life as a Kibana dashboarding system, and Torkel forked it into what became Grafana.

Grafana's sole focus is presenting monitoring data in a more usable and pleasing way. It can natively gather data from Graphite, Elasticsearch, OpenTSDB, Prometheus, and InfluxDB. There's an Enterprise version that uses plugins for more data sources, but there's no reason those other data source plugins couldn't be created as open source, as the Grafana plugin ecosystem already offers many other data sources.

What does Grafana do for me? It provides a central location for understanding my system. It is web-based, so anyone can access the information, although it can be restricted using different authentication methods. Grafana can provide knowledge at a glance using many different types of visualizations. However, it has started integrating alerting and other features that aren't traditionally combined with visualizations.

Now you can set alerts visually. That means you can look at a graph, maybe even one showing where an alert should have triggered due to some degradation of the system, click on the graph where you want the alert to trigger, and then tell Grafana where to send the alert. That's a pretty powerful addition that won't necessarily replace an alerting platform, but it can certainly help augment it by providing a different perspective on alerting criteria.

Grafana has also introduced more collaboration features. Users have been able to share dashboards for a long time, meaning you don't have to create your own dashboard for your Kubernetes cluster because there are several already available -- with some maintained by Kubernetes developers and others by Grafana developers.

The most significant addition around collaboration is annotations. Annotations allow a user to add context to part of a graph. Other users can then use this context to understand the system better. This is an invaluable tool when a team is in the middle of an incident and communication and common understanding are critical. Having all the information right where you're already looking makes it much more likely that knowledge will be shared across the team quickly. It's also a nice feature to use during blameless postmortems when the team is trying to understand how the failure occurred and learn more about their system.

Vizceral

Netflix created Vizceral to understand its traffic patterns better when performing a traffic failover. Unlike Grafana, which is a more general tool, Vizceral serves a very specific use case. Netflix no longer uses this tool internally and says it is no longer actively maintained, but it still updates the tool periodically. I highlight it here primarily to point out an interesting visualization mechanism and how it can help solve a problem. It's worth running it in a demo environment just to better grasp the concepts and witness what's possible with these systems.

[Nov 07, 2019] What breaks our systems A taxonomy of black swans Opensource.com

Nov 07, 2019 | opensource.com

What breaks our systems: A taxonomy of black swans Find and fix outlier events that create issues before they trigger severe production problems. 25 Oct 2018 Laura Nolan Feed 147 up 2 comments Image credits : Eumelincen . CC0 x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0

Black swans, by definition, can't be predicted, but sometimes there are patterns we can find and use to create defenses against categories of related problems.

For example, a large proportion of failures are a direct result of changes (code, environment, or configuration). Each bug triggered in this way is distinctive and unpredictable, but the common practice of canarying all changes is somewhat effective against this class of problems, and automated rollbacks have become a standard mitigation.

As our profession continues to mature, other kinds of problems are becoming well-understood classes of hazards with generalized prevention strategies.

Black swans observed in the wild

All technology organizations have production problems, but not all of them share their analyses. The organizations that publicly discuss incidents are doing us all a service. The following incidents describe one class of a problem and are by no means isolated instances. We all have black swans lurking in our systems; it's just some of us don't know it yet.

Hitting limits

Programming and development

Running headlong into any sort of limit can produce very severe incidents. A canonical example of this was Instapaper's outage in February 2017 . I challenge any engineer who has carried a pager to read the outage report without a chill running up their spine. Instapaper's production database was on a filesystem that, unknown to the team running the service, had a 2TB limit. With no warning, it stopped accepting writes. Full recovery took days and required migrating its database.

The organizations that publicly discuss incidents are doing us all a service. Limits can strike in various ways. Sentry hit limits on maximum transaction IDs in Postgres . Platform.sh hit size limits on a pipe buffer . SparkPost triggered AWS's DDoS protection . Foursquare hit a performance cliff when one of its datastores ran out of RAM .

One way to get advance knowledge of system limits is to test periodically. Good load testing (on a production replica) ought to involve write transactions and should involve growing each datastore beyond its current production size. It's easy to forget to test things that aren't your main datastores (such as Zookeeper). If you hit limits during testing, you have time to fix the problems. Given that resolution of limits-related issues can involve major changes (like splitting a datastore), time is invaluable.

When it comes to cloud services, if your service generates unusual loads or uses less widely used products or features (such as older or newer ones), you may be more at risk of hitting limits. It's worth load testing these, too. But warn your cloud provider first.

Finally, where limits are known, add monitoring (with associated documentation) so you will know when your systems are approaching those ceilings. Don't rely on people still being around to remember.

Spreading slowness
"The world is much more correlated than we give credit to. And so we see more of what Nassim Taleb calls 'black swan events' -- rare events happen more often than they should because the world is more correlated."
-- Richard Thaler

HostedGraphite's postmortem on how an AWS outage took down its load balancers (which are not hosted on AWS) is a good example of just how much correlation exists in distributed computing systems. In this case, the load-balancer connection pools were saturated by slow connections from customers that were hosted in AWS. The same kinds of saturation can happen with application threads, locks, and database connections -- any kind of resource monopolized by slow operations.

HostedGraphite's incident is an example of externally imposed slowness, but often slowness can result from saturation somewhere in your own system creating a cascade and causing other parts of your system to slow down. An incident at Spotify demonstrates such spread -- the streaming service's frontends became unhealthy due to saturation in a different microservice. Enforcing deadlines for all requests, as well as limiting the length of request queues, can prevent such spread. Your service will serve at least some traffic, and recovery will be easier because fewer parts of your system will be broken.

Retries should be limited with exponential backoff and some jitter. An outage at Square, in which its Redis datastore became overloaded due to a piece of code that retried failed transactions up to 500 times with no backoff, demonstrates the potential severity of excessive retries. The Circuit Breaker design pattern can be helpful here, too.

Dashboards should be designed to clearly show utilization, saturation, and errors for all resources so problems can be found quickly.

Thundering herds

Often, failure scenarios arise when a system is under unusually heavy load. This can arise organically from users, but often it arises from systems. A surge of cron jobs that starts at midnight is a venerable example. Mobile clients can also be a source of coordinated demand if they are programmed to fetch updates at the same time (of course, it is much better to jitter such requests).

Events occurring at pre-configured times aren't the only source of thundering herds. Slack experienced multiple outages over a short time due to large numbers of clients being disconnected and immediately reconnecting, causing large spikes of load. CircleCI saw a severe outage when a GitLab outage ended, leading to a surge of builds queued in its database, which became saturated and very slow.

Almost any service can be the target of a thundering herd. Planning for such eventualities -- and testing that your plan works as intended -- is therefore a must. Client backoff and load shedding are often core to such approaches.

If your systems must constantly ingest data that can't be dropped, it's key to have a scalable way to buffer this data in a queue for later processing.

Automation systems are complex systems
"Complex systems are intrinsically hazardous systems."
-- Richard Cook, MD

If your systems must constantly ingest data that can't be dropped, it's key to have a scalable way to buffer this data in a queue for later processing. The trend for the past several years has been strongly towards more automation of software operations. Automation of anything that can reduce your system's capacity (e.g., erasing disks, decommissioning devices, taking down serving jobs) needs to be done with care. Accidents (due to bugs or incorrect invocations) with this kind of automation can take down your system very efficiently, potentially in ways that are hard to recover from.

Christina Schulman and Etienne Perot of Google describe some examples in their talk Help Protect Your Data Centers with Safety Constraints . One incident sent Google's entire in-house content delivery network (CDN) to disk-erase.

Schulman and Perot suggest using a central service to manage constraints, which limits the pace at which destructive automation can operate, and being aware of system conditions (for example, avoiding destructive operations if the service has recently had an alert).

Automation systems can also cause havoc when they interact with operators (or with other automated systems). Reddit experienced a major outage when its automation restarted a system that operators had stopped for maintenance. Once you have multiple automation systems, their potential interactions become extremely complex and impossible to predict.

It will help to deal with the inevitable surprises if all this automation writes logs to an easily searchable, central place. Automation systems should always have a mechanism to allow them to be quickly turned off (fully or only for a subset of operations or targets).

Defense against the dark swans

These are not the only black swans that might be waiting to strike your systems. There are many other kinds of severe problem that can be avoided using techniques such as canarying, load testing, chaos engineering, disaster testing, and fuzz testing -- and of course designing for redundancy and resiliency. Even with all that, at some point your system will fail.

To ensure your organization can respond effectively, make sure your key technical staff and your leadership have a way to coordinate during an outage. For example, one unpleasant issue you might have to deal with is a complete outage of your network. It's important to have a fail-safe communications channel completely independent of your own infrastructure and its dependencies. For instance, if you run on AWS, using a service that also runs on AWS as your fail-safe communication method is not a good idea. A phone bridge or an IRC server that runs somewhere separate from your main systems is good. Make sure everyone knows what the communications platform is and practices using it.

Another principle is to ensure that your monitoring and your operational tools rely on your production systems as little as possible. Separate your control and your data planes so you can make changes even when systems are not healthy. Don't use a single message queue for both data processing and config changes or monitoring, for example -- use separate instances. In SparkPost: The Day the DNS Died , Jeremy Blosser presents an example where critical tools relied on the production DNS setup, which failed.

The psychology of battling the black swan

To ensure your organization can respond effectively, make sure your key technical staff and your leadership have a way to coordinate during an outage. Dealing with major incidents in production can be stressful. It really helps to have a structured incident-management process in place for these situations. Many technology organizations ( including Google ) successfully use a version of FEMA's Incident Command System. There should be a clear way for any on-call individual to call for assistance in the event of a major problem they can't resolve alone.

For long-running incidents, it's important to make sure people don't work for unreasonable lengths of time and get breaks to eat and sleep (uninterrupted by a pager). It's easy for exhausted engineers to make a mistake or overlook something that might resolve the incident faster.

Learn more

There are many other things that could be said about black (or formerly black) swans and strategies for dealing with them. If you'd like to learn more, I highly recommend these two books dealing with resilience and stability in production: Susan Fowler's Production-Ready Microservices and Michael T. Nygard's Release It! .


Laura Nolan will present What Breaks Our Systems: A Taxonomy of Black Swans at LISA18 , October 29-31 in Nashville, Tennessee, USA.

[Nov 07, 2019] Linux commands to display your hardware information

Nov 07, 2019 | opensource.com

Get the details on what's inside your computer from the command line. 16 Sep 2019 Howard Fosdick Feed 44 up 5 comments Image by : Opensource.com x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0

The easiest way is to do that is with one of the standard Linux GUI programs:

Alternatively, you could open up the box and read the labels on the disks, memory, and other devices. Or you could enter the boot-time panels -- the so-called UEFI or BIOS panels. Just hit the proper program function key during the boot process to access them. These two methods give you hardware details but omit software information.

Or, you could issue a Linux line command. Wait a minute that sounds difficult. Why would you do this?

The Linux Terminal

Sometimes it's easy to find a specific bit of information through a well-targeted line command. Perhaps you don't have a GUI program available or don't want to install one.

Probably the main reason to use line commands is for writing scripts. Whether you employ the Linux shell or another programming language, scripting typically requires coding line commands.

Many line commands for detecting hardware must be issued under root authority. So either switch to the root user ID, or issue the command under your regular user ID preceded by sudo :

sudo <the_line_command>

and respond to the prompt for the root password.

This article introduces many of the most useful line commands for system discovery. The quick reference chart at the end summarizes them.

Hardware overview

There are several line commands that will give you a comprehensive overview of your computer's hardware.

The inxi command lists details about your system, CPU, graphics, audio, networking, drives, partitions, sensors, and more. Forum participants often ask for its output when they're trying to help others solve problems. It's a standard diagnostic for problem-solving:

inxi -Fxz

The -F flag means you'll get full output, x adds details, and z masks out personally identifying information like MAC and IP addresses.

The hwinfo and lshw commands display much of the same information in different formats:

hwinfo --short

or

lshw -short

The long forms of these two commands spew out exhaustive -- but hard to read -- output:

hwinfo

or

lshw
CPU details

You can learn everything about your CPU through line commands. View CPU details by issuing either the lscpu command or its close relative lshw :

lscpu

or

lshw -C cpu

In both cases, the last few lines of output list all the CPU's capabilities. Here you can find out whether your processor supports specific features.

With all these commands, you can reduce verbiage and narrow any answer down to a single detail by parsing the command output with the grep command. For example, to view only the CPU make and model:

lshw -C cpu | grep -i product

To view just the CPU's speed in megahertz:

lscpu | grep -i mhz

or its BogoMips power rating:

lscpu | grep -i bogo

The -i flag on the grep command simply ensures your search ignores whether the output it searches is upper or lower case.

Memory

Linux line commands enable you to gather all possible details about your computer's memory. You can even determine whether you can add extra memory to the computer without opening up the box.

To list each memory stick and its capacity, issue the dmidecode command:

dmidecode -t memory | grep -i size

For more specifics on system memory, including type, size, speed, and voltage of each RAM stick, try:

lshw -short -C memory

One thing you'll surely want to know is is the maximum memory you can install on your computer:

dmidecode -t memory | grep -i max

Now find out whether there are any open slots to insert additional memory sticks. You can do this without opening your computer by issuing this command:

lshw -short -C memory | grep -i empty

A null response means all the memory slots are already in use.

Determining how much video memory you have requires a pair of commands. First, list all devices with the lspci command and limit the output displayed to the video device you're interested in:

lspci | grep -i vga

The output line that identifies the video controller will typically look something like this:

00:02.0 VGA compatible controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02)

Now reissue the lspci command, referencing the video device number as the selected device:

lspci -v -s 00:02.0

The output line identified as prefetchable is the amount of video RAM on your system:

...
Memory at f0100000 ( 32 -bit, non-prefetchable ) [ size =512K ]
I / O ports at 1230 [ size = 8 ]
Memory at e0000000 ( 32 -bit, prefetchable ) [ size =256M ]
Memory at f0000000 ( 32 -bit, non-prefetchable ) [ size =1M ]
...

Finally, to show current memory use in megabytes, issue:

free -m

This tells how much memory is free, how much is in use, the size of the swap area, and whether it's being used. For example, the output might look like this:

total used free shared buff / cache available
Mem: 11891 1326 8877 212 1687 10077
Swap: 1999 0 1999

The top command gives you more detail on memory use. It shows current overall memory and CPU use and also breaks it down by process ID, user ID, and the commands being run. It displays full-screen text output:

top
Disks, filesystems, and devices

You can easily determine whatever you wish to know about disks, partitions, filesystems, and other devices.

To display a single line describing each disk device:

lshw -short -C disk

Get details on any specific SATA disk, such as its model and serial numbers, supported modes, sector count, and more with:

hdparm -i /dev/sda

Of course, you should replace sda with sdb or another device mnemonic if necessary.

To list all disks with all their defined partitions, along with the size of each, issue:

lsblk

For more detail, including the number of sectors, size, filesystem ID and type, and partition starting and ending sectors:

fdisk -l

To start up Linux, you need to identify mountable partitions to the GRUB bootloader. You can find this information with the blkid command. It lists each partition's unique identifier (UUID) and its filesystem type (e.g., ext3 or ext4):

blkid

To list the mounted filesystems, their mount points, and the space used and available for each (in megabytes):

df -m

Finally, you can list details for all USB and PCI buses and devices with these commands:

lsusb

or

lspci
Network

Linux offers tons of networking line commands. Here are just a few.

To see hardware details about your network card, issue:

lshw -C network

Traditionally, the command to show network interfaces was ifconfig :

ifconfig -a

But many people now use:

ip link show

or

netstat -i

In reading the output, it helps to know common network abbreviations:

Abbreviation Meaning
lo Loopback interface
eth0 or enp* Ethernet interface
wlan0 Wireless interface
ppp0 Point-to-Point Protocol interface (used by a dial-up modem, PPTP VPN connection, or USB modem)
vboxnet0 or vmnet* Virtual machine interface

The asterisks in this table are wildcard characters, serving as a placeholder for whatever series of characters appear from system to system.

To show your default gateway and routing tables, issue either of these commands:

ip route | column -t

or

netstat -r
Software

Let's conclude with two commands that display low-level software details. For example, what if you want to know whether you have the latest firmware installed? This command shows the UEFI or BIOS date and version:

dmidecode -t bios

What is the kernel version, and is it 64-bit? And what is the network hostname? To find out, issue:

uname -a
Quick reference chart

This chart summarizes all the commands covered in this article:

Display info about all hardware inxi -Fxz --or--
hwinfo --short --or--
lshw -short
Display all CPU info lscpu --or--
lshw -C cpu
Show CPU features (e.g., PAE, SSE2) lshw -C cpu | grep -i capabilities
Report whether the CPU is 32- or 64-bit lshw -C cpu | grep -i width
Show current memory size and configuration dmidecode -t memory | grep -i size --or--
lshw -short -C memory
Show maximum memory for the hardware dmidecode -t memory | grep -i max
Determine whether memory slots are available lshw -short -C memory | grep -i empty
(a null answer means no slots available)
Determine the amount of video memory lspci | grep -i vga
then reissue with the device number;
for example: lspci -v -s 00:02.0
The VRAM is the prefetchable value.
Show current memory use free -m --or--
top
List the disk drives lshw -short -C disk
Show detailed information about a specific disk drive hdparm -i /dev/sda
(replace sda if necessary)
List information about disks and partitions lsblk (simple) --or--
fdisk -l (detailed)
List partition IDs (UUIDs) blkid
List mounted filesystems, their mount points,
and megabytes used and available for each
df -m
List USB devices lsusb
List PCI devices lspci
Show network card details lshw -C network
Show network interfaces ifconfig -a --or--
ip link show --or--
netstat -i
Display routing tables ip route | column -t --or--
netstat -r
Display UEFI/BIOS info dmidecode -t bios
Show kernel version, network hostname, more uname -a

Do you have a favorite command that I overlooked? Please add a comment and share it

[Nov 07, 2019] 13 open source backup solutions Opensource.com

Nov 07, 2019 | opensource.com

13 open source backup solutions Readers suggest more than a dozen of their favorite solutions for protecting data. 07 Mar 2019 Don Watkins (Community Moderator) Feed 124 up 6 comments Image by : Opensource.com x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0 poll that asked readers to vote on their favorite open source backup solution. We offered six solutions recommended by our moderator community -- Cronopete, Deja Dup, Rclone, Rdiff-backup, Restic, and Rsync -- and invited readers to share other options in the comments. And you came through, offering 13 other solutions (so far) that we either hadn't considered or hadn't even heard of.

By far the most popular suggestion was BorgBackup . It is a deduplicating backup solution that features compression and encryption. It is supported on Linux, MacOS, and BSD and has a BSD License.

Second was UrBackup , which does full and incremental image and file backups; you can save whole partitions or single directories. It has clients for Windows, Linux, and MacOS and has a GNU Affero Public License.

Third was LuckyBackup ; according to its website, "it is simple to use, fast (transfers over only changes made and not all data), safe (keeps your data safe by checking all declared directories before proceeding in any data manipulation), reliable, and fully customizable." It carries a GNU Public License.

Casync is content-addressable synchronization -- it's designed for backup and synchronizing and stores and retrieves multiple related versions of large file systems. It is licensed with the GNU Lesser Public License.

Syncthing synchronizes files between two computers. It is licensed with the Mozilla Public License and, according to its website, is secure and private. It works on MacOS, Windows, Linux, FreeBSD, Solaris, and OpenBSD.

Duplicati is a free backup solution that works on Windows, MacOS, and Linux and a variety of standard protocols, such as FTP, SSH, and WebDAV, and cloud services. It features strong encryption and is licensed with the GPL.

Dirvish is a disk-based virtual image backup system licensed under OSL-3.0. It also requires Rsync, Perl5, and SSH to be installed.

Bacula 's website says it "is a set of computer programs that permits the system administrator to manage backup, recovery, and verification of computer data across a network of computers of different kinds." It is supported on Linux, FreeBSD, Windows, MacOS, OpenBSD, and Solaris and the bulk of its source code is licensed under AGPLv3.

BackupPC "is a high-performance, enterprise-grade system for backing up Linux, Windows, and MacOS PCs and laptops to a server's disk," according to its website. It is licensed under the GPLv3.

Amanda is a backup system written in C and Perl that allows a system administrator to back up an entire network of client machines to a single server using tape, disk, or cloud-based systems. It was developed and copyrighted in 1991 at the University of Maryland and has a BSD-style license.

Back in Time is a simple backup utility designed for Linux. It provides a command line client and a GUI, both written in Python. To do a backup, just specify where to store snapshots, what folders to back up, and the frequency of the backups. BackInTime is licensed with GPLv2.

Timeshift is a backup utility for Linux that is similar to System Restore for Windows and Time Capsule for MacOS. According to its GitHub repository, "Timeshift protects your system by taking incremental snapshots of the file system at regular intervals. These snapshots can be restored at a later date to undo all changes to the system."

Kup is a backup solution that was created to help users back up their files to a USB drive, but it can also be used to perform network backups. According to its GitHub repository, "When you plug in your external hard drive, Kup will automatically start copying your latest changes."

[Nov 06, 2019] Destroying multiple production databases by Jan Gerrit Kootstra

Aug 08, 2019 | www.redhat.com
In my 22-year-old career as an IT specialist, I encountered two major issues where -- due to my mistakes -- important production databases were blown apart. Here are my stories. Freshman mistake

The first time was in the late 1990s when I started working at a service provider for my local municipality's social benefit agency. I got an assignment as a newbie system administrator to remove retired databases from the server where databases for different departments were consolidated.

Due to a type error on a top-level directory, I removed two live database files instead of the one retired database. What was worse was that due to the complexity of the database consolidation during the restore, other databases were hit, too. Repairing all databases took approximately 22 hours.

What helped

A good backup that was tested each night by recovering an empty file at the end of the tar archive catalog, after the backup was made. Future-looking statement

It's important to learn from our mistakes. What I learned is this:

Senior sysadmin mistake

In a period where partly offshoring IT activities was common practice in order to reduce costs, I had to take over a database filesystem extension on a Red Hat 5 cluster. Given that I set up this system a couple of years before, I had not checked the current situation.

I assumed the offshore team was familiar with the need to attach all shared LUNs to both nodes of the two-node cluster. My bad, never assume. As an Australian tourist once mentioned when a friend and I were on a vacation in Ireland after my Latin grammar school graduation: "Do not make an ars out of you me." Or, another phrase: "Assuming is the mother of all mistakes."

Well, I fell for my own trap. I went for the filesystem extension on the active node, and without checking the passive node's ( node2 ) status, tested a failover. Because we had agreed to run the database on node2 until the next update window, I had put myself in trouble.

As the databases started to fail, we brought the database cluster down. No issues yet, but all hell broke loose when I ran a filesystem check on an LVM-based system with missing physical volumes.

Looking back

I would say you're stupid to myself. Running pvs , lvs , or vgs would have alerted me that LVM detected issues. Also, comparing multipath configuration files would have revealed probable issues.

So, next time, I would first, check to see if LVM contains issues before going for the last resort: A filesystem check and trying to fix the millions of errors. Most of the time you will destroy files, anyway.

What saved my day

What saved my day back then was:

Future-looking statement

I definitely learned some things. For example, always check the environment you're about to work on before any change. Never assume that you know how an environment looks -- change is a constant in IT.

Also, share what you learned from your mistakes. Train offshore colleagues instead of blaming them. Also, inform them about the impact the issue had on the customer's business. A continent's major transport hub cannot be put on hold due to a sysadmin's mistake.

A shutdown of the transport hub might have been needed if we failed to solve the issue and the backup site in a data centre of another service provider would have been hurt too. Part of the hub is a harbour and we could have blown up a part of the harbour next to a village of about 10,000 people if both a cotton ship and an oil tanker would have gotten lost on the harbour master's map and collided.

General lessons learned

I learned some important lessons overall from these and other mistakes:

I cannot stress this enough: Learn from your mistakes to avoid them in the future, rather than learning how to make them on a weekly basis. Jan Gerrit Kootstra Solution Designer (for Telco network services). Red Hat Accelerator. More about me

[Nov 02, 2019] LVM spanning over multiple disks What disk is a file on? Can I lose a drive without total loss

Notable quotes:
"... If you lose a drive in a volume group, you can force the volume group online with the missing physical volume, but you will be unable to open the LV's that were contained on the dead PV, whether they be in whole or in part. ..."
"... So, if you had for instance 10 LV's, 3 total on the first drive, #4 partially on first drive and second drive, then 5-7 on drive #2 wholly, then 8-10 on drive 3, you would be potentially able to force the VG online and recover LV's 1,2,3,8,9,10.. #4,5,6,7 would be completely lost. ..."
"... LVM doesn't really have the concept of a partition it uses PVs (Physical Volumes), which can be a partition. These PVs are broken up into extents and then these are mapped to the LVs (Logical Volumes). When you create the LVs you can specify if the data is striped or mirrored but the default is linear allocation. So it would use the extents in the first PV then the 2nd then the 3rd. ..."
"... As Peter has said the blocks appear as 0's if a PV goes missing. So you can potentially do data recovery on files that are on the other PVs. But I wouldn't rely on it. You normally see LVM used in conjunction with RAIDs for this reason. ..."
"... it's effectively as if a huge chunk of your disk suddenly turned to badblocks. You can patch things back together with a new, empty drive to which you give the same UUID, and then run an fsck on any filesystems on logical volumes that went across the bad drive to hope you can salvage something. ..."
Mar 16, 2015 | serverfault.com

LVM spanning over multiple disks: What disk is a file on? Can I lose a drive without total loss? Ask Question Asked 8 years, 10 months ago Active 4 years, 6 months ago Viewed 9k times 7 2 I have three 990GB partitions over three drives in my server. Using LVM, I can create one ~3TB partition for file storage.

1) How does the system determine what partition to use first?
2) Can I find what disk a file or folder is physically on?
3) If I lose a drive in the LVM, do I lose all data, or just data physically on that disk? storage lvm share

edited Mar 16 '15 at 12:53

HopelessN00b 49k 25 25 gold badges 121 121 silver badges 194 194 bronze badges asked Dec 2 '10 at 2:28 Luke has no name Luke has no name 989 10 10 silver badges 13 13 bronze badges

add a comment | 3 Answers 3 active oldest votes 12
  1. The system fills from the first disk in the volume group to the last, unless you configure striping with extents.
  2. I don't think this is possible, but where I'd start to look is in the lvs/vgs commands man pages.
  3. If you lose a drive in a volume group, you can force the volume group online with the missing physical volume, but you will be unable to open the LV's that were contained on the dead PV, whether they be in whole or in part.
  4. So, if you had for instance 10 LV's, 3 total on the first drive, #4 partially on first drive and second drive, then 5-7 on drive #2 wholly, then 8-10 on drive 3, you would be potentially able to force the VG online and recover LV's 1,2,3,8,9,10.. #4,5,6,7 would be completely lost.
Peter Grace Peter Grace 2,676 2 2 gold badges 22 22 silver badges 38 38 bronze badges add a comment | 3

1) How does the system determine what partition to use first?

LVM doesn't really have the concept of a partition it uses PVs (Physical Volumes), which can be a partition. These PVs are broken up into extents and then these are mapped to the LVs (Logical Volumes). When you create the LVs you can specify if the data is striped or mirrored but the default is linear allocation. So it would use the extents in the first PV then the 2nd then the 3rd.

2) Can I find what disk a file or folder is physically on?

You can determine what PVs a LV has allocation extents on. But I don't know of a way to get that information for an individual file.

3) If I lose a drive in the LVM, do I lose all data, or just data physically on that disk?

As Peter has said the blocks appear as 0's if a PV goes missing. So you can potentially do data recovery on files that are on the other PVs. But I wouldn't rely on it. You normally see LVM used in conjunction with RAIDs for this reason.

3dinfluence 3dinfluence 12k 1 1 gold badge 23 23 silver badges 38 38 bronze badges

add a comment | 2 I don't know the answer to #2, so I'll leave that to someone else. I suspect "no", but I'm willing to be happily surprised.

1 is: you tell it, when you combine the physical volumes into a volume group.

3 is: it's effectively as if a huge chunk of your disk suddenly turned to badblocks. You can patch things back together with a new, empty drive to which you give the same UUID, and then run an fsck on any filesystems on logical volumes that went across the bad drive to hope you can salvage something.

And to the overall, unasked question: yeah, you probably don't really want to do that.

[Nov 02, 2019] Raid-5 is obsolete if you use large drives , such as 2TB or 3TB disks. You should instead use raid-6 ( two disks can fail)

Notable quotes:
"... RAID5 can survive a single drive failure. However, once you replace that drive, it has to be initialized. Depending on the controller and other things, this can take anywhere from 5-18 hours. During this time, all drives will be in constant use to re-create the failed drive. It is during this time that people worry that the rebuild would cause another drive near death to die, causing a complete array failure. ..."
"... If during a rebuild one of the remaining disks experiences BER, your rebuild stops and you may have headaches recovering from such a situation, depending on controller design and user interaction. ..."
"... RAID5 + a GOOD backup is something to consider, though. ..."
"... Raid-5 is obsolete if you use large drives , such as 2TB or 3TB disks. You should instead use raid-6 ..."
"... RAID 6 offers more redundancy than RAID 5 (which is absolutely essential, RAID 5 is a walking disaster) at the cost of multiple parity writes per data write. This means the performance will be typically worse (although it's not theoretically much worse, since the parity operations are in parallel). ..."
Oct 03, 2019 | hardforum.com

RAID5 can survive a single drive failure. However, once you replace that drive, it has to be initialized. Depending on the controller and other things, this can take anywhere from 5-18 hours. During this time, all drives will be in constant use to re-create the failed drive. It is during this time that people worry that the rebuild would cause another drive near death to die, causing a complete array failure.

This isn't the only danger. The problem with 2TB disks, especially if they are not 4K sector disks, is that they have relative high BER rate for their capacity, so the likelihood of BER actually occurring and translating into an unreadable sector is something to worry about.

If during a rebuild one of the remaining disks experiences BER, your rebuild stops and you may have headaches recovering from such a situation, depending on controller design and user interaction.

So i would say with modern high-BER drives you should say:

So essentially you'll lose one parity disk alone for the BER issue. Not everyone will agree with my analysis, but considering RAID5 with today's high-capacity drives 'safe' is open for debate.

RAID5 + a GOOD backup is something to consider, though.

  1. So you're saying BER is the error count that 'escapes' the ECC correction? I do not believe that is correct, but i'm open to good arguments or links.

    As i understand, the BER is what prompt bad sectors, where the number of errors exceed that of the ECC error correcting ability; and you will have an unrecoverable sector (Current Pending Sector in SMART output).

    Also these links are interesting in this context:

    http://blog.econtech.selfip.org/200...s-not-fully-readable-a-lawsuit-in-the-making/

    The short story first: Your consumer level 1TB SATA drive has a 44% chance that it can be completely read without any error. If you run a RAID setup, this is really bad news because it may prevent rebuilding an array in the case of disk failure, making your RAID not so Redundant. Click to expand...
    Not sure on the numbers the article comes up with, though.

    Also this one is interesting:
    http://lefthandnetworks.typepad.com/virtual_view/2008/02/what-does-data.html

    BER simply means that while reading your data from the disk drive you will get an average of one non-recoverable error in so many bits read, as specified by the manufacturer. Click to expand...
    Rebuilding the data on a replacement drive with most RAID algorithms requires that all the other data on the other drives be pristine and error free. If there is a single error in a single sector, then the data for the corresponding sector on the replacement drive cannot be reconstructed, and therefore the RAID rebuild fails and data is lost. The frequency of this disastrous occurrence is derived from the BER. Simple calculations will show that the chance of data loss due to BER is much greater than all other reasons combined. Click to expand...
    These links do suggest that BER works to produce un-recoverable sectors, and not 'escape' them as 'undetected' bad sectors, if i understood you correctly.
  1. parityOCP said:
    That's guy's a bit of a scaremonger to be honest. He may have a point with consumer drives, but the article is sensationalised to a certain degree. However, there are still a few outfits that won't go past 500GB/drive in an array (even with enterprise drives), simply to reduce the failure window during a rebuild. Click to expand...
    Why is he a scaremonger? He is correct. Have you read his article? In fact, he has copied his argument from Adam Leventhal(?) that was one of the ZFS developers I believe.

    Adam's argument goes likes this:
    Disks are getting larger all the time, in fact, the storage increases exponentially. At the same time, the bandwidth is increasing not that fast - we are still at 100MB/sek even after decades. So, bandwidth has increased maybe 20x after decades. While storage has increased from 10MB to 3TB = 300.000 times.

    The trend is clear. In the future when we have 10TB drives, they will not be much faster than today. This means, to repair an raid with 3TB disks today, will take several days, maybe even one week. With 10TB drives, it will take several weeks, maybe a month.

    Repairing a raid stresses the other disks much, which means they can break too. Experienced sysadmins reports that this happens quite often during a repair. Maybe because those disks come from the same batch, they have the same weakness. Some sysadmins therefore mix disks from different vendors and batches.

    Hence, I would not want to run a raid with 3TB disks and only use raid-5. During those days, if only another disk crashes you have lost all your data.

    Hence, that article is correct, and he is not a scaremonger. Raid-5 is obsolete if you use large drives, such as 2TB or 3TB disks. You should instead use raid-6 (two disks can fail). That is the conclusion of the article: use raid-6 with large disks, forget raid-5. This is true, and not scaremongery.

    In fact, ZFS has therefore something called raidz3 - which means that three disks can fail without problems. To the OT: no raid-5 is not safe. Neither is raid-6, because neither of them can not always repair nor detect corrupted data. There are cases when they dont even notice that you got corrupted bits. See my other thread for more information about this. That is the reason people are switching to ZFS - which always CAN detect and repair those corrupted bits. I suggest, sell your hardware raid card, and use ZFS which requires no hardware. ZFS just uses JBOD.

    Here are research papers on raid-5, raid-6 and ZFS and corruption:
    http://hardforum.com/showpost.php?p=1036404173&postcount=73

  1. brutalizer said:
    The trend is clear. In the future when we have 10TB drives, they will not be much faster than today. This means, to repair an raid with 3TB disks today, will take several days, maybe even one week. With 10TB drives, it will take several weeks, maybe a month. Click to expand...
    While I agree with the general claim that the larger HDDs (1.5, 2, 3TBs) are best used in RAID 6, your claim about rebuild times is way off.

    I think it is not unreasonable to assume that the 10TB drives will be able to read and write at 200 MB/s or more. We already have 2TB drives with 150MB/s sequential speeds, so 200 MB/s is actually a conservative estimate.

    10e12/200e6 = 50000 secs = 13.9 hours. Even if there is 100% overhead (half the throughput), that is less than 28 hours to do the rebuild. It is a long time, but it is no where near a month! Try to back your claims in reality.

    And you have again made the false claim that "ZFS - which always CAN detect and repair those corrupted bits". ZFS can usually detect corrupted bits, and can usually correct them if you have duplication or parity, but nothing can always detect and repair. ZFS is safer than many alternatives, but nothing is perfectly safe. Corruption can and has happened with ZFS, and it will happen again.

Is RAID5 safe with Five 2TB Hard Drives ? | [H]ard|Forum Your browser indicates if you've visited this link

https://hardforum.com /threads/is-raid5-safe-with-five-2tb-hard-drives.1560198/

Hence, that article is correct, and he is not a scaremonger. Raid-5 is obsolete if you use large drives , such as 2TB or 3TB disks. You should instead use raid-6 ( two disks can fail). That is the conclusion of the article: use raid-6 with large disks, forget raid-5 . This is true, and not scaremongery.

RAID 5 Data Recovery How to Rebuild a Failed RAID 5 - YouTube

RAID 5 vs RAID 10: Recommended RAID For Safety and ... Your browser indicates if you've visited this link

https://www.cyberciti.biz /tips/raid5-vs-raid-10-safety-performance.html

RAID 6 offers more redundancy than RAID 5 (which is absolutely essential, RAID 5 is a walking disaster) at the cost of multiple parity writes per data write. This means the performance will be typically worse (although it's not theoretically much worse, since the parity operations are in parallel).

[Oct 22, 2019] Computer Historians Crack Passwords of Unix's Early Pioneers

Oct 22, 2019 | tech.slashdot.org

(boingboing.net) 60

Posted by BeauHD on Thursday October 10, 2019 @05:00PM from the cracking-the-code dept. JustAnotherOldGuy shares a report from Boing Boing:

Early versions of the free/open Unix variant BSD came with password files that included hashed passwords for such Unix luminaries as Dennis Ritchie, Stephen R. Bourne, Eric Schmidt, Brian W. Kernighan and Stuart Feldman. Leah Neukirchen recovered an BSD version 3 source tree and revealed that she was able to crack many of the weak passwords used by the equally weak hashing algorithm from those bygone days .

Dennis MacAlistair Ritchie's was "dmac," Bourne's was "bourne," Schmidt's was "wendy!!!" (his wife's name), Feldman's was "axlotl," and Kernighan's was "/.,/.,." Four more passwords were cracked by Arthur Krewat: Ozalp Babaolu's was "12ucdort," Howard Katseff's was "graduat;," Tom London's was "..pnn521," Bob Fabry's was "561cml.." and Ken Thompson's was "p/q2-q4!" (chess notation for a common opening move). BSD 3 used Descrypt for password hashing, which limited passwords to eight characters, salted with 12 bits of entropy.

Re:Only paranoid people need secure passwords ( Score: 4 , Interesting) by lgw ( 121541 ) on Thursday October 10, 2019 @05:39PM ( #59293980 ) Journal

Kernighan's password was rather good. It's a sad state of affairs that it would be blocked by so many websites today.

Re:Only paranoid people need secure passwords ( Score: 5 , Informative) by yorgasor ( 109984 ) < ron.tritechs@net > on Thursday October 10, 2019 @10:16PM ( #59294702 ) Homepage

It's rare that websites are brute force hacked. Usually people gain access via malware or some other security hole, escalate their privileges and then grab a copy of the password hashes. Then they can run the password file through a list of other known passwords to catch the low hanging fruit, then use various other brute force attacks to try to get the rest. If you've got a difficult enough password, they'll give up on it and focus on the easier ones to crack. But if their password hashes also comes with account names (often email addresses), then they can try accessing lots of other websites with that email/password combo, which is why it's dangerous to reuse your passwords.

Re:password ( Score: 5 , Funny) by vittico ( 1086555 ) on Thursday October 10, 2019 @06:32PM ( #59294150 )

well... Bourne used bourne just saying...

e:password ( Score: 5 , Funny) by 93 Escort Wagon ( 326346 ) on Thursday October 10, 2019 @07:51PM ( #59294374 )

To be fair, he wasn't using his name - he was using the name of his preferred shell.

done back in the 1980's, probably ( Score: 5 , Informative) by david_bonn ( 259998 ) < davidbonn@m[ ]com ['ac.' in gap] > on Thursday October 10, 2019 @05:31PM ( #59293954 ) Homepage Journal

It was well-understood by the mid 1980's that the 12-bit salting scheme was breakable with existing hardware. That is why everyone quietly moved to larger salts during that time period.

With reasonable coding assumptions it was possible to crack most any password in 3-5 days on an early 68k box (e.g. Masscomp or Codata). No, I don't have the code any longer.

My understanding is that Morris modified DES for use in passwd(5) so that you couldn't use hardware DES to brute-force decrypt passwords.

Unfortunately I suspect he introduced a vulnerability because apparently the hash leaked information about the key, and since the high-order bits of a DES key are parity bits you could use that as a prybar to narrow your search space.

Different world back then ( Score: 5 , Informative) by Mike Van Pelt ( 32582 ) on Thursday October 10, 2019 @06:35PM ( #59294170 )

Security really wasn't at a high premium back then. The need was also less. You might have a prankster get into your account for a practical joke, but those pranksters probably had root access anyway. The computer wasn't being used for financial transactions or anything like it; the most expensive thing you could do was swipe a copy of AT&T Unix.

Back in the '70s on the Univac 1100/80, my password as zxcv. That was at work -- professional environment, no internet, no dialup, so the only threats would be internal.

And when my boss decided to play a practical joke on my account, he just used the Univac equivalent of root access. As did I, with my retaliation.

After 50 years ... ( Score: 2 ) by Retired ICS ( 6159680 ) on Friday October 11, 2019 @05:06AM ( #59295306 )

So after 50 years they managed to crack a password. News at 11. Seems to me that the original algorithm was pretty damn secure!

Re:After 50 years ... ( Score: 1 ) by ebvwfbw ( 864834 ) on Friday October 11, 2019 @01:55PM ( #59296772 )

Not really. Some of us cracked those passwords more than 20 years ago when John came out.

Good enough in the day ( Score: 2 ) by Shirley Marquez ( 1753714 ) on Friday October 11, 2019 @03:07PM ( #59297084 ) Homepage

Most of those passwords were good enough for their time and place. The kind of brute force computation that the researchers used would have been prohibitively expensive back then. The passwords were not protecting anything of great value so there was little incentive to crack them. (Some of the code those people were working on later came to be quite valuable, but it was not perceived as having monetary value at the time.) Unless you had access to one of the relatively few ARPANET sites, you would have needed physical access to the computers to attempt to crack them.

[Oct 22, 2019] Flaw In Sudo Enables Non-Privileged Users To Run Commands As Root

Notable quotes:
"... the function which converts user id into its username incorrectly treats -1, or its unsigned equivalent 4294967295, as 0, which is always the user ID of root user. ..."
Oct 22, 2019 | linux.slashdot.org

(thehackernews.com) 139 Posted by BeauHD on Monday October 14, 2019 @07:30PM from the Su-doh dept. exomondo shares a report from The Hacker News:

... ... ...

The vulnerability, tracked as CVE-2019-14287 and discovered by Joe Vennix of Apple Information Security, is more concerning because the sudo utility has been designed to let users use their own login password to execute commands as a different user without requiring their password. \

What's more interesting is that this flaw can be exploited by an attacker to run commands as root just by specifying the user ID "-1" or "4294967295."

That's because the function which converts user id into its username incorrectly treats -1, or its unsigned equivalent 4294967295, as 0, which is always the user ID of root user.

The vulnerability affects all Sudo versions prior to the latest released version 1.8.28, which has been released today.

    • Re:Not many systems vulnerable )
      mysidia ( 191772 ) #59309858)

      If you have been blessed with the power to run commands as ANY user you want, then you are still specially privileged, even though you are not fully privileged.

      Its a rare/unusual configuration to say (all, !root) --- the people using this configuration on their systems should probably KNOW there are going to exist some ways that access can be abused to ultimately circumvent the intended !root rule - If not within sudo itself, then by using sudo to get a shell as a different user UID that belongs to some person or program who DOES have root permissions, and then causing crafted code to run as that user --- For example, by installing a
      Trojanned version of the screen command and modifying files in the home directory of a legitimate root user to alias the screen command to trojanned version that will log the password the next time that Other user logs in normally and uses the sudo command.

[Oct 20, 2019] SSL certificate installation in RHEL 6 and 7

Oct 20, 2019 | access.redhat.com

Resolution

  1. Install/update the latest ca-certificates package. Raw
    # yum install ca-certificates
    
  2. For example, download the RapidSSL Intermediate CA Certificates .
  3. Enable the dynamic CA configuration feature by running: Raw
    # update-ca-trust enable
    
  4. Copy the RapidSSL Intermediate CA certificate to the directory /etc/pki/ca-trust/source/anchors/ : Raw
    # cp rapidSSL-ca.crt /etc/pki/ca-trust/source/anchors/
    
  5. Extract and add the Intermediate CA certificate to the list of trusted CA's: Raw
    # update-ca-trust extract
    
  6. Verify the SSL certificate that signed by RapidSSL: Raw
    # openssl verify  server.crt 
    server.crt : OK
    

[Oct 02, 2019] raid5 - Can I recover a RAID 5 array if two drives have failed - Server Fault

Oct 02, 2019 | serverfault.com

Can I recover a RAID 5 array if two drives have failed? Ask Question Asked 9 years ago Active 2 years, 3 months ago Viewed 58k times I have a Dell 2600 with 6 drives configured in a RAID 5 on a PERC 4 controller. 2 drives failed at the same time, and according to what I know a RAID 5 is recoverable if 1 drive fails. I'm not sure if the fact I had six drives in the array might save my skin.

I bought 2 new drives and plugged them in but no rebuild happened as I expected. Can anyone shed some light? raid raid5 dell-poweredge share Share a link to this question

add a comment | 4 Answers 4 active oldest votes

11 Regardless of how many drives are in use, a RAID 5 array only allows for recovery in the event that just one disk at a time fails.

What 3molo says is a fair point but even so, not quite correct I think - if two disks in a RAID5 array fail at the exact same time then a hot spare won't help, because a hot spare replaces one of the failed disks and rebuilds the array without any intervention, and a rebuild isn't possible if more than one disk fails.

For now, I am sorry to say that your options for recovering this data are going to involve restoring a backup.

For the future you may want to consider one of the more robust forms of RAID (not sure what options a PERC4 supports) such as RAID 6 or a nested RAID array . Once you get above a certain amount of disks in an array you reach the point where the chance that more than one of them can fail before a replacement is installed and rebuilt becomes unacceptably high. share Share a link to this answer Copy link | improve this answer edited Jun 8 '12 at 13:37 longneck 21.1k 3 3 gold badges 43 43 silver badges 76 76 bronze badges answered Sep 21 '10 at 14:43 Rob Moir Rob Moir 30k 4 4 gold badges 53 53 silver badges 84 84 bronze badges

add a comment | 2 You can try to force one or both of the failed disks to be online from the BIOS interface of the controller. Then check that the data and the file system are consistent. share Share a link to this answer Copy link | improve this answer answered Sep 21 '10 at 15:35 Mircea Vutcovici Mircea Vutcovici 13.6k 3 3 gold badges 42 42 silver badges 69 69 bronze badges add a comment | 2 Direct answer is "No". In-direct -- "It depends". Mainly it depends on whether disks are partially out of order, or completely. In case there're partially broken, you can give it a try -- I would copy (using tool like ddrescue) both failed disks. Then I'd try to run the bunch of disks using Linux SoftRAID -- re-trying with proper order of disks and stripe-size in read-only mode and counting CRC mismatches. It's quite doable, I should say -- this text in Russian mentions 12 disk RAID50's recovery using LSR , for example. share Share a link to this answer Copy link | improve this answer edited Jun 8 '12 at 15:12 Skyhawk 13.5k 3 3 gold badges 45 45 silver badges 91 91 bronze badges answered Jun 8 '12 at 14:11 poige poige 7,370 2 2 gold badges 16 16 silver badges 38 38 bronze badges add a comment | 0 It is possible if raid was with one spare drive , and one of your failed disks died before the second one. So, you just need need to try reconstruct array virtually with 3d party software . Found small article about this process on this page: http://www.angeldatarecovery.com/raid5-data-recovery/

And, if you realy need one of died drives you can send it to recovery shops. With of this images you can reconstruct raid properly with good channces.

[Sep 23, 2019] How to recover deleted files with foremost on Linux - LinuxConfig.org

Sep 23, 2019 | linuxconfig.org
Details
System Administration
15 September 2019
Contents In this article we will talk about foremost , a very useful open source forensic utility which is able to recover deleted files using the technique called data carving . The utility was originally developed by the United States Air Force Office of Special Investigations, and is able to recover several file types (support for specific file types can be added by the user, via the configuration file). The program can also work on partition images produced by dd or similar tools.

In this tutorial you will learn:

foremost-manual <img src=https://linuxconfig.org/images/foremost_manual.png alt=foremost-manual width=1200 height=675 /> Foremost is a forensic data recovery program for Linux used to recover files using their headers, footers, and data structures through a process known as file carving. Software Requirements and Conventions Used
Software Requirements and Linux Command Line Conventions
Category Requirements, Conventions or Software Version Used
System Distribution-independent
Software The "foremost" program
Other Familiarity with the command line interface
Conventions # - requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command
$ - requires given linux commands to be executed as a regular non-privileged user
Installation

Since foremost is already present in all the major Linux distributions repositories, installing it is a very easy task. All we have to do is to use our favorite distribution package manager. On Debian and Ubuntu, we can use apt :

$ sudo apt install foremost

In recent versions of Fedora, we use the dnf package manager to install packages , the dnf is a successor of yum . The name of the package is the same:

$ sudo dnf install foremost

If we are using ArchLinux, we can use pacman to install foremost . The program can be found in the distribution "community" repository:

$ sudo pacman -S foremost

SUBSCRIBE TO NEWSLETTER
Subscribe to Linux Career NEWSLETTER and receive latest Linux news, jobs, career advice and tutorials.

me name=


Basic usage
WARNING
No matter which file recovery tool or process your are going to use to recover your files, before you begin it is recommended to perform a low level hard drive or partition backup, hence avoiding an accidental data overwrite !!! In this case you may re-try to recover your files even after unsuccessful recovery attempt. Check the following dd command guide on how to perform hard drive or partition low level backup.

The foremost utility tries to recover and reconstruct files on the base of their headers, footers and data structures, without relying on filesystem metadata . This forensic technique is known as file carving . The program supports various types of files, as for example:

The most basic way to use foremost is by providing a source to scan for deleted files (it can be either a partition or an image file, as those generated with dd ). Let's see an example. Imagine we want to scan the /dev/sdb1 partition: before we begin, a very important thing to remember is to never store retrieved data on the same partition we are retrieving the data from, to avoid overwriting delete files still present on the block device. The command we would run is:

$ sudo foremost -i /dev/sdb1

By default, the program creates a directory called output inside the directory we launched it from and uses it as destination. Inside this directory, a subdirectory for each supported file type we are attempting to retrieve is created. Each directory will hold the corresponding file type obtained from the data carving process:

output
├── audit.txt
├── avi
├── bmp
├── dll
├── doc
├── docx
├── exe
├── gif
├── htm
├── jar
├── jpg
├── mbd
├── mov
├── mp4
├── mpg
├── ole
├── pdf
├── png  
├── ppt
├── pptx
├── rar
├── rif
├── sdw
├── sx
├── sxc
├── sxi
├── sxw
├── vis
├── wav
├── wmv
├── xls
├── xlsx
└── zip

When foremost completes its job, empty directories are removed. Only the ones containing files are left on the filesystem: this let us immediately know what type of files were successfully retrieved. By default the program tries to retrieve all the supported file types; to restrict our search, we can, however, use the -t option and provide a list of the file types we want to retrieve, separated by a comma. In the example below, we restrict the search only to gif and pdf files:

$ sudo foremost -t gif,pdf -i /dev/sdb1

https://www.youtube.com/embed/58S2wlsJNvo

In this video we will test the forensic data recovery program Foremost to recover a single png file from /dev/sdb1 partition formatted with the EXT4 filesystem.

me name=


Specifying an alternative destination

As we already said, if a destination is not explicitly declared, foremost creates an output directory inside our cwd . What if we want to specify an alternative path? All we have to do is to use the -o option and provide said path as argument. If the specified directory doesn't exist, it is created; if it exists but it's not empty, the program throws a complain:

ERROR: /home/egdoc/data is not empty
        Please specify another directory or run with -T.

To solve the problem, as suggested by the program itself, we can either use another directory or re-launch the command with the -T option. If we use the -T option, the output directory specified with the -o option is timestamped. This makes possible to run the program multiple times with the same destination. In our case the directory that would be used to store the retrieved files would be:

/home/egdoc/data_Thu_Sep_12_16_32_38_2019
The configuration file

The foremost configuration file can be used to specify file formats not natively supported by the program. Inside the file we can find several commented examples showing the syntax that should be used to accomplish the task. Here is an example involving the png type (the lines are commented since the file type is supported by default):

# PNG   (used in web pages)
#       (NOTE THIS FORMAT HAS A BUILTIN EXTRACTION FUNCTION)
#       png     y       200000  \x50\x4e\x47?   \xff\xfc\xfd\xfe

The information to provide in order to add support for a file type, are, from left to right, separated by a tab character: the file extension ( png in this case), whether the header and footer are case sensitive ( y ), the maximum file size in Bytes ( 200000 ), the header ( \x50\x4e\x47? ) and and the footer ( \xff\xfc\xfd\xfe ). Only the latter is optional and can be omitted.

If the path of the configuration file it's not explicitly provided with the -c option, a file named foremost.conf is searched and used, if present, in the current working directory. If it is not found the default configuration file, /etc/foremost.conf is used instead.

Adding the support for a file type

By reading the examples provided in the configuration file, we can easily add support for a new file type. In this example we will add support for flac audio files. Flac (Free Lossless Audio Coded) is a non-proprietary lossless audio format which is able to provide compressed audio without quality loss. First of all, we know that the header of this file type in hexadecimal form is 66 4C 61 43 00 00 00 22 ( fLaC in ASCII), and we can verify it by using a program like hexdump on a flac file:

$ hexdump -C
blind_guardian_war_of_wrath.flac|head
00000000  66 4c 61 43 00 00 00 22  12 00 12 00 00 00 0e 00  |fLaC..."........|
00000010  36 f2 0a c4 42 f0 00 4d  04 60 6d 0b 64 36 d7 bd  |6...B..M.`m.d6..|
00000020  3e 4c 0d 8b c1 46 b6 fe  cd 42 04 00 03 db 20 00  |>L...F...B.... .|
00000030  00 00 72 65 66 65 72 65  6e 63 65 20 6c 69 62 46  |..reference libF|
00000040  4c 41 43 20 31 2e 33 2e  31 20 32 30 31 34 31 31  |LAC 1.3.1 201411|
00000050  32 35 21 00 00 00 12 00  00 00 54 49 54 4c 45 3d  |25!.......TITLE=|
00000060  57 61 72 20 6f 66 20 57  72 61 74 68 11 00 00 00  |War of Wrath....|
00000070  52 45 4c 45 41 53 45 43  4f 55 4e 54 52 59 3d 44  |RELEASECOUNTRY=D|
00000080  45 0c 00 00 00 54 4f 54  41 4c 44 49 53 43 53 3d  |E....TOTALDISCS=|
00000090  32 0c 00 00 00 4c 41 42  45 4c 3d 56 69 72 67 69  |2....LABEL=Virgi|

As you can see the file signature is indeed what we expected. Here we will assume a maximum file size of 30 MB, or 30000000 Bytes. Let's add the entry to the file:

flac    y       30000000    \x66\x4c\x61\x43\x00\x00\x00\x22

The footer signature is optional so here we didn't provide it. The program should now be able to recover deleted flac files. Let's verify it. To test that everything works as expected I previously placed, and then removed, a flac file from the /dev/sdb1 partition, and then proceeded to run the command:

$ sudo foremost -i /dev/sdb1 -o $HOME/Documents/output

As expected, the program was able to retrieve the deleted flac file (it was the only file on the device, on purpose), although it renamed it with a random string. The original filename cannot be retrieved because, as we know, files metadata is contained in the filesystem, and not in the file itself:

/home/egdoc/Documents
└── output
    ├── audit.txt
    └── flac
        └── 00020482.flac

me name=


The audit.txt file contains information about the actions performed by the program, in this case:

Foremost version 1.5.7 by Jesse Kornblum, Kris
Kendall, and Nick Mikus
Audit File

Foremost started at Thu Sep 12 23:47:04 2019
Invocation: foremost -i /dev/sdb1 -o /home/egdoc/Documents/output
Output directory: /home/egdoc/Documents/output
Configuration file: /etc/foremost.conf
------------------------------------------------------------------
File: /dev/sdb1
Start: Thu Sep 12 23:47:04 2019
Length: 200 MB (209715200 bytes)

Num      Name (bs=512)         Size      File Offset     Comment

0:      00020482.flac         28 MB        10486784
Finish: Thu Sep 12 23:47:04 2019

1 FILES EXTRACTED

flac:= 1
------------------------------------------------------------------

Foremost finished at Thu Sep 12 23:47:04 2019
Conclusion

In this article we learned how to use foremost, a forensic program able to retrieve deleted files of various types. We learned that the program works by using a technique called data carving , and relies on files signatures to achieve its goal. We saw an example of the program usage and we also learned how to add the support for a specific file type using the syntax illustrated in the configuration file. For more information about the program usage, please consult its manual page.

[Sep 22, 2019] Easing into automation with Ansible Enable Sysadmin

Sep 19, 2019 | www.redhat.com
It's easier than you think to get started automating your tasks with Ansible. This gentle introduction gives you the basics you need to begin streamlining your administrative life.

Posted | by Jörg Kastning (Red Hat Accelerator)

Image
"DippingToes 02.jpg" by Zhengan is licensed under CC BY-SA 4.0

In the end of 2015 and the beginning of 2016, we decided to use Red Hat Enterprise Linux (RHEL) as our third operating system, next to Solaris and Microsoft Windows. I was part of the team that tested RHEL, among other distributions, and would engage in the upcoming operation of the new OS. Thinking about a fast-growing number of Red Hat Enterprise Linux systems, it came to my mind that I needed a tool to automate things because without automation the number of hosts I can manage is limited.

I had experience with Puppet back in the day but did not like that tool because of its complexity. We had more modules and classes than hosts to manage back then. So, I took a look at Ansible version 2.1.1.0 in July 2016.

What I liked about Ansible and still do is that it is push-based. On a target node, only Python and SSH access are needed to control the node and push configuration settings to it. No agent needs to be removed if you decide that Ansible isn't the right tool for you. The YAML syntax is easy to read and write, and the option to use playbooks as well as ad hoc commands makes Ansible a flexible solution that helps save time in our day-to-day business. So, it was at the end of 2016 when we decided to evaluate Ansible in our environment.

First steps

As a rule of thumb, you should begin automating things that you have to do on a daily or at least a regular basis. That way, automation saves time for more interesting or more important things. I followed this rule by using Ansible for the following tasks:

  1. Set a baseline configuration for newly provisioned hosts (set DNS, time, network, sshd, etc.)
  2. Set up patch management to install Red Hat Security Advisories (RHSAs) .
  3. Test how useful the ad hoc commands are, and where we could benefit from them.
Baseline Ansible configuration

For us, baseline configuration is the configuration every newly provisioned host gets. This practice makes sure the host fits into our environment and is able to communicate on the network. Because the same configuration steps have to be made for each new host, this is an awesome step to get started with automation.

The following are the tasks I started with:

(Some of these steps are already published here on Enable Sysadmin, as you can see, and others might follow soon.)

All of these tasks have in common that they are small and easy to start with, letting you gather experience with using different kinds of Ansible modules, roles, variables, and so on. You can run each of these roles and tasks standalone, or tie them all together in one playbook that sets the baseline for your newly provisioned system.

Red Hat Enterprise Linux Server patch management with Ansible

As I explained on my GitHub page for ansible-role-rhel-patchmanagement , in our environment, we deploy Red Hat Enterprise Linux Servers for our operating departments to run their applications.

More about automation

This role was written to provide a mechanism to install Red Hat Security Advisories on target nodes once a month. In our special use case, only RHSAs are installed to ensure a minimum security limit. The installation is enforced once a month. The advisories are summarized in "Patch-Sets." This way, it is ensured that the same advisories are used for all stages during a patch cycle.

The Ansible Inventory nodes are summarized in one of the following groups, each of which defines when a node is scheduled for patch installation:

In case packages were updated on target nodes, the hosts will reboot afterward.

Because the production systems are most important, they are divided into two separate groups (phase3 and phase4) to decrease the risk of failure and service downtime due to advisory installation.

You can find more about this role in my GitHub repo: https://github.com/Tronde/ansible-role-rhel-patchmanagement .

Updating and patch management are tasks every sysadmin has to deal with. With these roles, Ansible helped me get this task done every month, and I don't have to care about it anymore. Only when a system is not reachable, or yum has a problem, do I get an email report telling me to take a look. But, I got lucky, and have not yet received any mail report for the last couple of months, now. (Yes, of course, the system is able to send mail.)

Ad hoc commands

The possibility to run ad hoc commands for quick (and dirty) tasks was one of the reasons I chose Ansible. You can use these commands to gather information when you need them or to get things done without the need to write a playbook first.

I used ad hoc commands in cron jobs until I found the time to write playbooks for them. But, with time comes practice, and today I try to use playbooks and roles for every task that has to run more than once.

Here are small examples of ad hoc commands that provide quick information about your nodes.

Query package version
ansible all -m command -a'/usr/bin/rpm -qi <PACKAGE NAME>' | grep 'SUCCESS\|Version'
Query OS-Release
ansible all -m command -a'/usr/bin/cat /etc/os-release'
Query running kernel version
ansible all -m command -a'/usr/bin/uname -r'
Query DNS servers in use by nodes
ansible all -m command -a'/usr/bin/cat /etc/resolv.conf' | grep 'SUCCESS\|nameserver'

Hopefully, these samples give you an idea for what ad hoc commands can be used.

Summary

It's not hard to start with automation. Just look for small and easy tasks you do every single day, or even more than once a day, and let Ansible do these tasks for you.

Eventually, you will be able to solve more complex tasks as your automation skills grow. But keep things as simple as possible. You gain nothing when you have to troubleshoot a playbook for three days when it solves a task you could have done in an hour.

[Want to learn more about Ansible? Check out these free e-books .]

[Sep 18, 2019] Oracle woos Red Hat users with Autonomous Linux InfoWorld

Notable quotes:
"... Sending a shot across the bow to rival Red Hat, Oracle has introduced Oracle Autonomous Linux, an "autonomous operating system" in the Oracle Cloud that is designed to eliminate manual OS management and human error. Oracle Autonomous Linux is patched, updated, and tuned without human interaction. ..."
Sep 18, 2019 | www.infoworld.com
Sending a shot across the bow to rival Red Hat, Oracle has introduced Oracle Autonomous Linux, an "autonomous operating system" in the Oracle Cloud that is designed to eliminate manual OS management and human error. Oracle Autonomous Linux is patched, updated, and tuned without human interaction.

The Oracle Autonomous Linux service is paired with the new Oracle OS Management Service, which is a cloud infrastructure component providing control over systems running Autonomous Linux, Linux, or Windows. Binary compatibility is offered for IBM Red Hat Enterprise Linux, providing for application compatibility on Oracle Cloud infrastructure.

[ Get Matt Asay's take on cloud industry developments on InfoWorld: Public cloud winners take most . • Oracle's and IBM's hybrid cloud defense may not hold . • The ugly truth about cloud computing in the enterprise . | Keep up with the latest developments in cloud computing with InfoWorld's Cloud Computing newsletter . ]

The benefits of Oracle Autonomous Linux include:

Leveraging Oracle Cloud Infrastructure, Autonomous Linux has capabilities such autoscaling, monitoring, and lifecycle management across pools. Autonomous Linux and OS Management Service are included with Oracle Premier Support at no extra charge with Oracle Cloud Infrastructure compute services. Based on Oracle Linux, Autonomous Linux is available for deployment only in Oracle Cloud and not for on-premises deployments.

Where to access Oracle Autonomous Linux

You can access an instance of Oracle Autonomous Linux from the Oracle Cloud Marketplace.

[Sep 16, 2019] 10 Ansible modules you need to know Opensource.com

Sep 16, 2019 | opensource.com

10 Ansible modules you need to know See examples and learn the most important modules for automating everyday tasks with Ansible. 11 Sep 2019 DirectedSoul (Red Hat) Feed 25 up 4 comments x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0 Ansible is an open source IT configuration management and automation platform. It uses human-readable YAML templates so users can program repetitive tasks to happen automatically without having to learn an advanced programming language.

Ansible is agentless, which means the nodes it manages do not require any software to be installed on them. This eliminates potential security vulnerabilities and makes overall management smoother.

Ansible modules are standalone scripts that can be used inside an Ansible playbook. A playbook consists of a play, and a play consists of tasks. These concepts may seem confusing if you're new to Ansible, but as you begin writing and working more with playbooks, they will become familiar.

More on Ansible

There are some modules that are frequently used in automating everyday tasks; those are the ones that we will cover in this article.

Ansible has three main files that you need to consider:

Module 1: Package management

There is a module for most popular package managers, such as DNF and APT, to enable you to install any package on a system. Functionality depends entirely on the package manager, but usually these modules can install, upgrade, downgrade, remove, and list packages. The names of relevant modules are easy to guess. For example, the DNF module is dnf_module , the old YUM module (required for Python 2 compatibility) is yum_module , while the APT module is apt_module , the Slackpkg module is slackpkg_module , and so on.

Example 1:

- name : install the latest version of Apache and MariaDB
dnf :
name :
- httpd
- mariadb-server
state : latest

This installs the Apache web server and the MariaDB SQL database.

Example 2: - name : Install a list of packages
yum :
name :
- nginx
- postgresql
- postgresql-server
state : present

This installs the list of packages and helps download multiple packages.

Module 2: Service

After installing a package, you need a module to start it. The service module enables you to start, stop, and reload installed packages; this comes in pretty handy.

Example 1: - name : Start service foo, based on running process /usr/bin/foo
service :
name : foo
pattern : /usr/bin/foo
state : started

This starts the service foo .

Example 2: - name : Restart network service for interface eth0
service :
name : network
state : restarted
args : eth0

This restarts the network service of the interface eth0 .

Module 3: Copy

The copy module copies a file from the local or remote machine to a location on the remote machine.

Example 1: - name : Copy a new "ntp.conf file into place, backing up the original if it differs from the copied version
copy:
src: /mine/ntp.conf
dest: /etc/ntp.conf
owner: root
group: root
mode: '0644'
backup: yes Example 2: - name : Copy file with owner and permission, using symbolic representation
copy :
src : /srv/myfiles/foo.conf
dest : /etc/foo.conf
owner : foo
group : foo
mode : u=rw,g=r,o=r Module 4: Debug

The debug module prints statements during execution and can be useful for debugging variables or expressions without having to halt the playbook.

Example 1: - name : Display all variables/facts known for a host
debug :
var : hostvars [ inventory_hostname ]
verbosity : 4

This displays all the variable information for a host that is defined in the inventory file.

Example 2: - name : Write some content in a file /tmp/foo.txt
copy :
dest : /tmp/foo.txt
content : |
Good Morning!
Awesome sunshine today.
register : display_file_content
- name : Debug display_file_content
debug :
var : display_file_content
verbosity : 2

This registers the content of the copy module output and displays it only when you specify verbosity as 2. For example:

ansible-playbook demo.yaml -vv
Module 5: File

The file module manages the file and its properties.

Example 1: - name : Change file ownership, group and permissions
file :
path : /etc/foo.conf
owner : foo
group : foo
mode : '0644'

This creates a file named foo.conf and sets the permission to 0644 .

Example 2: - name : Create a directory if it does not exist
file :
path : /etc/some_directory
state : directory
mode : '0755'

This creates a directory named some_directory and sets the permission to 0755 .

Module 6: Lineinfile

The lineinfile module manages lines in a text file.

Example 1: - name : Ensure SELinux is set to enforcing mode
lineinfile :
path : /etc/selinux/config
regexp : '^SELINUX='
line : SELINUX=enforcing

This sets the value of SELINUX=enforcing .

Example 2: - name : Add a line to a file if the file does not exist, without passing regexp
lineinfile :
path : /etc/resolv.conf
line : 192.168.1.99 foo.lab.net foo
create : yes

This adds an entry for the IP and hostname in the resolv.conf file.

Module 7: Git

The git module manages git checkouts of repositories to deploy files or software.

Example 1: # Example Create git archive from repo
- git :
repo : https://github.com/ansible/ansible-examples.git
dest : /src/ansible-examples
archive : /tmp/ansible-examples.zip Example 2: - git :
repo : https://github.com/ansible/ansible-examples.git
dest : /src/ansible-examples
separate_git_dir : /src/ansible-examples.git

This clones a repo with a separate Git directory.

Module 8: Cli_command

The cli_command module , first available in Ansible 2.7, provides a platform-agnostic way of pushing text-based configurations to network devices over the network_cli connection plugin.

Example 1: - name : commit with comment
cli_config :
config : set system host-name foo
commit_comment : this is a test

This sets the hostname for a switch and exits with a commit message.

Example 2: - name : configurable backup path
cli_config :
config : "{{ lookup('template', 'basic/config.j2') }}"
backup : yes
backup_options :
filename : backup.cfg
dir_path : /home/user

This backs up a config to a different destination file.

Module 9: Archive

The archive module creates a compressed archive of one or more files. By default, it assumes the compression source exists on the target.

Example 1: - name : Compress directory /path/to/foo/ into /path/to/foo.tgz
archive :
path : /path/to/foo
dest : /path/to/foo.tgz Example 2: - name : Create a bz2 archive of multiple files, rooted at /path
archive :
path :
- /path/to/foo
- /path/wong/foo
dest : /path/file.tar.bz2
format : bz2 Module 10: Command

One of the most basic but useful modules, the command module takes the command name followed by a list of space-delimited arguments.

Example 1: - name : return motd to registered var
command : cat /etc/motd
register : mymotd Example 2: - name : Change the working directory to somedir/ and run the command as db_owner if /path/to/database does not exist.
command : /usr/bin/make_database.sh db_user db_name
become : yes
become_user : db_owner
args :
chdir : somedir/
creates : /path/to/database Conclusion

There are tons of modules available in Ansible, but these ten are the most basic and powerful ones you can use for an automation job. As your requirements change, you can learn about other useful modules by entering ansible-doc <module-name> on the command line or refer to the official documentation .

[Sep 16, 2019] Creating virtual machines with Vagrant and Ansible for website development Opensource.com

Sep 16, 2019 | opensource.com

Using Vagrant and Ansible to deploy virtual machines for web development 23 Feb 2016 Betsy Gamrat Feed 363 up 1 comment Image by : Jason Baker for Opensource.com. x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0 Vagrant and Ansible are tools to efficiently provision virtual machines, also called VMs, or in Vagrant terms, the word "boxes" is often used. We begin with a short discussion of why a web developer would invest the time to use these tools, then cover the required software, an overview of how Vagrant works with virtual machine providers, and the use of Ansible to provision a virtual machine. Background

First, this article builds on our guide to installing and testing eZ Publish 5 in a virtual machine, which may be a useful reference. We'll describe the tools you can use to automate the creation and provisioning of virtual machines. There is much to cover beyond what we talk about here, but for now we will focus on providing a general overview.

Why should you use virtual machines for website development?

The traditional ways that developers work on websites are on remote development machines or locally, directly on their main operating system. There are many advantages to using virtual machines for development, including the following:

Vagrant and Ansible help to automate the provisioning of the virtual machines. Vagrant handles the starting and stopping of the machines as well as some configuration, and Ansible breaks down the details of the machines into easy-to-read configuration files and installs and configures the software within the virtual machines.

General approach

We usually keep the site code on the host operating system and share these files from the host operating system to the virtual machines. This way, the virtual machines can be loaded with only the software required to run the site, and each team member can use their favorite local tools for code editing, version control, and more under the operating system they use most often.

Risks

This scheme is not without risk and complexity. A virtual machine is a great tool for your collection, but like any tool, you will need to take some time to learn how to use it.

Getting started

The first step is to install the required software: Vagrant , Ansible , and VirtualBox . We will focus only on VirtualBox in this post, but you can also use a different provider , including many options which are open source. You may need some VirtualBox extensions and Vagrant plugins as well. Take the time to read the documentation carefully.

Then, you will need a starting point for your virtual server, typically called a "base box." For your first virtual machine, it is easiest to use an existing box. There are many boxes up at HashiCorp's Atlas and at Vagrantbox.es which are suitable for testing, but be sure you use a trusted provider for any box which is used in production.

Once you've chosen your box, these commands should bring it to life:

$ vagrant box add name-of-box url-of-box
$ vagrant init name-of-box
$ vagrant up
Provisioning, customizing, and accessing the virtual machine

Once the box is up and running, you can start adding software to it using Ansible. Plan to spend a lot of time learning Ansible. It is well worth the investment. You'll use Ansible to load the system software, create databases, configure the server, create users, set file ownership and permissions, set up services, and much more -- basically, to configure the virtual machine to include everything you need. Once you have the Ansible scripts set up you will be able to re-use them with different virtual machines and also run them against remote servers (which is a topic for another day!).

The easiest way to SSH into the box is:

$ vagrant ssh

You may update your /etc/hosts file to map the virtual machine box IP address to an easy to remember name for SSH and browser access. Once the box is running and serving pages, you can start working on the site.

Development workflow

Using a virtual machine as described here doesn't significantly change normal development workflows when it comes to editing code. The host machine and virtual machine share the application files through the path configured under Vagrant. You can edit the files using your code editor of choice on the host, or make minor adjustments with a text editor on the virtual machine. You can also keep version control tools on the host machine. In other words, all of the code modifications made on the host machine automatically show up on the virtual machine.

You just get to enjoy the convenience of local development, and an environment that mimics the server(s) where your code will be deployed!

This article was originally published at Mugo Web . Republished with permission.

[Sep 16, 2019] An introduction to Virtual Machine Manager Opensource.com

Sep 16, 2019 | opensource.com

When installing a VM to run as a server, I usually disable or remove its sound capability. To do so, select Sound and click Remove or right-click on Sound and choose Remove Hardware .

You can also add hardware with the Add Hardware button at the bottom. This brings up the Add New Virtual Hardware screen where you can add additional storage devices, memory, sound, etc. It's like having access to a very well-stocked (if virtual) computer hardware warehouse.

11-vmm_addnewhardware.png

Once you are happy with your VM configuration, click Begin Installation , and the system will boot and begin installing your specified operating system from the ISO.

12-vmm_rhelbegininstall.png

Once it completes, it reboots, and your new VM is ready for use.

13-vmm_rhelinstalled.png

Virtual Machine Manager is a powerful tool for desktop Linux users. It is open source and an excellent alternative to proprietary and closed virtualization products.

[Sep 13, 2019] How to use Ansible Galaxy Enable Sysadmin

Sep 13, 2019 | www.redhat.com

Ansible is a multiplier, a tool that automates and scales infrastructure of every size. It is considered to be a configuration management, orchestration, and deployment tool. It is easy to get up and running with Ansible. Even a new sysadmin could start automating with Ansible in a matter of a few hours.

Ansible automates using the SSH protocol. The control machine uses an SSH connection to communicate with its target hosts, which are typically Linux hosts. If you're a Windows sysadmin, you can still use Ansible to automate your Windows environments using WinRM as opposed to SSH. Presently, though, the control machine still needs to run Linux.

More about automation

As a new sysadmin, you might start with just a few playbooks. But as your automation skills continue to grow, and you become more familiar with Ansible, you will learn best practices and further realize that as your playbooks increase, using Ansible Galaxy becomes invaluable.

In this article, you will learn a bit about Ansible Galaxy, its structure, and how and when you can put it to use.

What Ansible does

Common sysadmin tasks that can be performed with Ansible include patching, updating systems, user and group management, and provisioning. Ansible presently has a huge footprint in IT Automation -- if not the largest presently -- and is considered to be the most popular and widely used configuration management, orchestration, and deployment tool available today.

One of the main reasons for its popularity is its simplicity. It's simple, powerful, and agentless. Which means a new or entry-level sysadmin can hit the ground automating in a matter of hours. Ansible allows you to scale quickly, efficiently, and cross-functionally.

Create roles with Ansible Galaxy

Ansible Galaxy is essentially a large public repository of Ansible roles. Roles ship with READMEs detailing the role's use and available variables. Galaxy contains a large number of roles that are constantly evolving and increasing.

Galaxy can use git to add other role sources, such as GitHub. You can initialize a new galaxy role using ansible-galaxy init , or you can install a role directly from the Ansible Galaxy role store by executing the command ansible-galaxy install <name of role> .

Here are some helpful ansible-galaxy commands you might use from time to time:

To create an Ansible role using Ansible Galaxy, we need to use the ansible-galaxy command and its templates. Roles must be downloaded before they can be used in playbooks, and they are placed into the default directory /etc/ansible/roles . You can find role examples at https://galaxy.ansible.com/geerlingguy :

Image Create collections

While Ansible Galaxy has been the go-to tool for constructing and managing roles, with new iterations of Ansible you are bound to see changes or additions. On Ansible version 2.8 you get the new feature of collections .

What are collections and why are they worth mentioning? As the Ansible documentation states:

Collections are a distribution format for Ansible content. They can be used to package and distribute playbooks, roles, modules, and plugins.

Collections follow a simple structure:

collection/
├── docs/
├── galaxy.yml
├── plugins/
│ ├──  modules/
│ │ └──  module1.py
│ ├──  inventory/
│ └──  .../
├── README.md
├── roles/
│ ├──  role1/
│ ├──  role2/
│ └──  .../
├── playbooks/
│ ├──  files/
│ ├──  vars/
│ ├──  templates/
│ └──  tasks/
└──  tests/
Image
Creating a collection skeleton.

The ansible-galaxy-collection command implements the following commands. Notably, a few of the subcommands are the same as used with ansible-galaxy :

In order to determine what can go into a collection, a great resource can be found here .

Conclusion

Establish yourself as a stellar sysadmin with an automation solution that is simple, powerful, agentless, and scales your infrastructure quickly and efficiently. Using Ansible Galaxy to create roles is superb thinking, and an ideal way to be organized and thoughtful in managing your ever-growing playbooks.

The only way to improve your automation skills is to work with a dedicated tool and prove the value and positive impact of automation on your infrastructure.

[Sep 06, 2019] Python vs. Ruby Which is best for web development Opensource.com

Sep 06, 2019 | opensource.com

Python was developed organically in the scientific space as a prototyping language that easily could be translated into C++ if a prototype worked. This happened long before it was first used for web development. Ruby, on the other hand, became a major player specifically because of web development; the Rails framework extended Ruby's popularity with people developing complex websites.

Which programming language best suits your needs? Here is a quick overview of each language to help you choose:

Approach: one best way vs. human-language Python

Python takes a direct approach to programming. Its main goal is to make everything obvious to the programmer. In Python, there is only one "best" way to do something. This philosophy has led to a language strict in layout.

Python's core philosophy consists of three key hierarchical principles:

This regimented philosophy results in Python being eminently readable and easy to learn -- and why Python is great for beginning coders. Python has a big foothold in introductory programming courses . Its syntax is very simple, with little to remember. Because its code structure is explicit, the developer can easily tell where everything comes from, making it relatively easy to debug.

Python's hierarchy of principles is evident in many aspects of the language. Its use of whitespace to do flow control as a core part of the language syntax differs from most other languages, including Ruby. The way you indent code determines the meaning of its action. This use of whitespace is a prime example of Python's "explicit" philosophy, the shape a Python app takes spells out its logic and how the app will act.

Ruby

In contrast to Python, Ruby focuses on "human-language" programming, and its code reads like a verbal language rather than a machine-based one, which many programmers, both beginners and experts, like. Ruby follows the principle of " least astonishment ," and offers myriad ways to do the same thing. These similar methods can have multiple names, which many developers find confusing and frustrating.

Unlike Python, Ruby makes use of "blocks," a first-class object that is treated as a unit within a program. In fact, Ruby takes the concept of OOP (Object-Oriented Programming) to its limit. Everything is an object -- even global variables are actually represented within the ObjectSpace object. Classes and modules are themselves objects, and functions and operators are methods of objects. This ability makes Ruby especially powerful, especially when combined with its other primary strength: functional programming and the use of lambdas.

In addition to blocks and functional programming, Ruby provides programmers with many other features, including fragmentation, hashable and unhashable types, and mutable strings.

Ruby's fans find its elegance to be one of its top selling points. At the same time, Ruby's "magical" features and flexibility can make it very hard to track down bugs.

Communities: stability vs. innovation

Although features and coding philosophy are the primary drivers for choosing a given language, the strength of a developer community also plays an important role. Fortunately, both Python and Ruby boast strong communities.

Python

Python's community already includes a large Linux and academic community and therefore offers many academic use cases in both math and science. That support gives the community a stability and diversity that only grows as Python increasingly is used for web development.

Ruby

However, Ruby's community has focused primarily on web development from the get-go. It tends to innovate more quickly than the Python community, but this innovation also causes more things to break. In addition, while it has gotten more diverse, it has yet to reach the level of diversity that Python has.

Final thoughts

For web development, Ruby has Rails and Python has Django. Both are powerful frameworks, so when it comes to web development, you can't go wrong with either language. Your decision will ultimately come down to your level of experience and your philosophical preferences.

If you plan to focus on building web applications, Ruby is popular and flexible. There is a very strong community built upon it and they are always on the bleeding edge of development.

If you are interested in building web applications and would like to learn a language that's used more generally, try Python. You'll get a diverse community and lots of influence and support from the various industries in which it is used.

Tom Radcliffe - Tom Radcliffe has over 20 years experience in software development and management in both academia and industry. He is a professional engineer (PEO and APEGBC) and holds a PhD in physics from Queen's University at Kingston. Tom brings a passion for quantitative, data-driven processes to ActiveState .

[Aug 31, 2019] Linux on your laptop A closer look at EFI boot options

Aug 31, 2019 | www.zdnet.com
Before EFI, the standard boot process for virtually all PC systems was called "MBR", for Master Boot Record; today you are likely to hear it referred to as "Legacy Boot". This process depended on using the first physical block on a disk to hold some information needed to boot the computer (thus the name Master Boot Record); specifically, it held the disk address at which the actual bootloader could be found, and the partition table that defined the layout of the disk. Using this information, the PC firmware could find and execute the bootloader, which would then bring up the computer and run the operating system.

This system had a number of rather obvious weaknesses and shortcomings. One of the biggest was that you could only have one bootable object on each physical disk drive (at least as far as the firmware boot was concerned). Another was that if that first sector on the disk became corrupted somehow, you were in deep trouble.

Over time, as part of the Extensible Firmware Interface, a new approach to boot configuration was developed. Rather than storing critical boot configuration information in a single "magic" location, EFI uses a dedicated "EFI boot partition" on the desk. This is a completely normal, standard disk partition, the same as which may be used to hold the operating system or system recovery data.

The only requirement is that it be FAT formatted, and it should have the boot and esp partition flags set (esp stands for EFI System Partition). The specific data and programs necessary for booting is then kept in directories on this partition, typically in directories named to indicate what they are for. So if you have a Windows system, you would typically find directories called 'Boot' and 'Microsoft' , and perhaps one named for the manufacturer of the hardware, such as HP. If you have a Linux system, you would find directories called opensuse, debian, ubuntu, or any number of others depending on what particular Linux distribution you are using.

It should be obvious from the description so far that it is perfectly possible with the EFI boot configuration to have multiple boot objects on a single disk drive.

Before going any further, I should make it clear that if you install Linux as the only operating system on a PC, it is not necessary to know all of this configuration information in detail. The installer should take care of setting all of this up, including creating the EFI boot partition (or using an existing EFI boot partition), and further configuring the system boot list so that whatever system you install becomes the default boot target.

If you were to take a brand new computer with UEFI firmware, and load it from scratch with any of the current major Linux distributions, it would all be set up, configured, and working just as it is when you purchase a new computer preloaded with Windows (or when you load a computer from scratch with Windows). It is only when you want to have more than one bootable operating system – especially when you want to have both Linux and Windows on the same computer – that things may become more complicated.

The problems that arise with such "multiboot" systems are generally related to getting the boot priority list defined correctly.

When you buy a new computer with Windows, this list typically includes the Windows bootloader on the primary disk, and then perhaps some other peripheral devices such as USB, network interfaces and such. When you install Linux alongside Windows on such a computer, the installer will add the necessary information to the EFI boot partition, but if the boot priority list is not changed, then when the system is rebooted after installation it will simply boot Windows again, and you are likely to think that the installation didn't work.

There are several ways to modify this boot priority list, but exactly which ones are available and whether or how they work depends on the firmware of the system you are using, and this is where things can get really messy. There are just about as many different UEFI firmware implementations as there are PC manufacturers, and the manufacturers have shown a great deal of creativity in the details of this firmware.

First, in the simplest case, there is a software utility included with Linux called efibootmgr that can be used to modify, add or delete the boot priority list. If this utility works properly, and the changes it makes are permanent on the system, then you would have no other problems to deal with, and after installing it would boot Linux and you would be happy. Unfortunately, while this is sometimes the case it is frequently not. The most common reason for this is that changes made by software utilities are not actually permanently stored by the system BIOS, so when the computer is rebooted the boot priority list is restored to whatever it was before, which generally means that Windows gets booted again.

The other common way of modifying the boot priority list is via the computer BIOS configuration program. The details of how to do this are different for every manufacturer, but the general procedure is approximately the same. First you have to press the BIOS configuration key (usually F2, but not always, unfortunately) during system power-on (POST). Then choose the Boot item from the BIOS configuration menu, which should get you to a list of boot targets presented in priority order. Then you need to modify that list; sometimes this can be done directly in that screen, via the usual F5/F6 up/down key process, and sometimes you need to proceed one level deeper to be able to do that. I wish I could give more specific and detailed information about this, but it really is different on every system (sometimes even on different systems produced by the same manufacturer), so you just need to proceed carefully and figure out the steps as you go.

I have seen a few rare cases of systems where neither of these methods works, or at least they don't seem to be permanent, and the system keeps reverting to booting Windows. Again, there are two ways to proceed in this case. The first is by simply pressing the "boot selection" key during POST (power-on). Exactly which key this is varies, I have seen it be F12, F9, Esc, and probably one or two others. Whichever key it turns out to be, when you hit it during POST you should get a list of bootable objects defined in the EFI boot priority list, so assuming your Linux installation worked you should see it listed there. I have known of people who were satisfied with this solution, and would just use the computer this way and have to press boot select each time they wanted to boot Linux.

The alternative is to actually modify the files in the EFI boot partition, so that the (unchangeable) Windows boot procedure would actually boot Linux. This involves overwriting the Windows file bootmgfw.efi with the Linux file grubx64.efi. I have done this, especially in the early days of EFI boot, and it works, but I strongly advise you to be extremely careful if you try it, and make sure that you keep a copy of the original bootmgfw.efi file. Finally, just as a final (depressing) warning, I have also seen systems where this seemed to work, at least for a while, but then at some unpredictable point the boot process seemed to notice that something had changed and it restored bootmgfw.efi to its original state – thus losing the Linux boot configuration again. Sigh.

So, that's the basics of EFI boot, and how it can be configured. But there are some important variations possible, and some caveats to be aware of.

[Aug 28, 2019] How to navigate Ansible documentation Enable Sysadmin

Aug 28, 2019 | www.redhat.com

We take our first glimpse at the Ansible documentation on the official website. While Ansible can be overwhelming with so many immediate options, let's break down what is presented to us here. Putting our attention on the page's main pane, we are given five offerings from Ansible. This pane is a central location, or one-stop-shop, to maneuver through the documentation for products like Ansible Tower, Ansible Galaxy, and Ansible Lint: Image

We can even dive into Ansible Network for specific module documentation that extends the power and ease of Ansible automation to network administrators. The focal point of the rest of this article will be around Ansible Project, to give us a great starting point into our automation journey:

Image

Once we click the Ansible Documentation tile under the Ansible Project section, the first action we should take is to ensure we are viewing the documentation's correct version. We can get our current version of Ansible from our control node's command line by running ansible --version . Armed with the version information provided by the output, we can select the matching version in the site's upper-left-hand corner using the drop-down menu, that by default says latest :

Image

[Aug 18, 2019] Oracle Linux 7 Update 7 Released

Oracle Linux 7 Update 7 features many of the same changes as RHEL 7.7 but now also adding an updated Unbreakable Enterprise Kernel Release 5 based on Linux 4.14.35 with many extra patches compared to RHEL7's default Linux 3.10 based kernel.
Oracle Linux can be used for free with paid support if needed. So it is more flexible option than Centos. At least for companies who have no money to buy RHEL and are reluctant to run Centos as some of their applications are not supported on it.
Aug 15, 2019 | blogs.oracle.com

Oracle Linux 7 Update 7 ships with the following kernel packages, that include bug fixes, security fixes and enhancements:
•Unbreakable Enterprise Kernel (UEK) Release 5 (kernel-uek-4.14.35-1902.3.2.el7) for x86-64 and aarch64
•Red Hat Compatible Kernel (RHCK) (kernel-3.10.0-1062.el7) for x86-64 only

Oracle Linux maintains user space compatibility with Red Hat Enterprise Linux (RHEL), which is independent of the kernel version that underlies the operating system. Existing applications in user space will continue to run unmodified on Oracle Linux 7 Update 7 with UEK Release 5 and no re-certifications are needed for applications already certified with Red Hat Enterprise Linux 7 or Oracle Linux 7.

[Aug 18, 2019] Hundreds of Thousands of People Are Using Passwords That Have Already Been Hacked, Google Says

Aug 18, 2019 | tech.slashdot.org

(vice.com) 58

Posted by msmash on Friday August 16, 2019 @12:45PM

A new Google study this week confirmed the obvious: internet users need to stop using the same password for multiple websites unless they're keen on having their data hijacked, their identity stolen, or worse. From a report: It seems like not a day goes by without a major company being hacked or leaving user email addresses and passwords exposed to the public internet. These login credentials are then routinely used by hackers to hijack your accounts, a threat that's largely mitigated by using a password manager and unique password for each site you visit. Sites like "have I been pwned?" can help users track if their data has been exposed, and whether they need to worry about their credentials bouncing around the dark web. But it's still a confusing process for many users unsure of which passwords need updating.

To that end, last February Google unveiled a new experimental Password Checkup extension for Chrome . The extension warns you any time you log into a website using one of over 4 billion publicly-accessible usernames and passwords that have been previously exposed by a major hack or breach, and prompts you to change your password when necessary. The extension was built in concert with cryptography experts at Stanford University to ensure that Google never learns your usernames or passwords, the company says in an explainer. Anonymous telemetry data culled from the extension has provided Google with some interesting information on how widespread the practice of account hijacking and non-unique passwords really is.

[Aug 04, 2019] 10 YAML tips for people who hate YAML Enable SysAdmin

Aug 04, 2019 | www.redhat.com

10 YAML tips for people who hate YAML Do you hate YAML? These tips might ease your pain.

Posted June 10, 2019 | by Seth Kenlon (Red Hat)

Image
There are lots of formats for configuration files: a list of values, key and value pairs, INI files, YAML, JSON, XML, and many more. Of these, YAML sometimes gets cited as a particularly difficult one to handle for a few different reasons. While its ability to reflect hierarchical values is significant and its minimalism can be refreshing to some, its Python-like reliance upon syntactic whitespace can be frustrating.

However, the open source world is diverse and flexible enough that no one has to suffer through abrasive technology, so if you hate YAML, here are 10 things you can (and should!) do to make it tolerable. Starting with zero, as any sensible index should.

0. Make your editor do the work

Whatever text editor you use probably has plugins to make dealing with syntax easier. If you're not using a YAML plugin for your editor, find one and install it. The effort you spend on finding a plugin and configuring it as needed will pay off tenfold the very next time you edit YAML.

For example, the Atom editor comes with a YAML mode by default, and while GNU Emacs ships with minimal support, you can add additional packages like yaml-mode to help.

Emacs in YAML and whitespace mode.

If your favorite text editor lacks a YAML mode, you can address some of your grievances with small configuration changes. For instance, the default text editor for the GNOME desktop, Gedit, doesn't have a YAML mode available, but it does provide YAML syntax highlighting by default and features configurable tab width:

Configuring tab width and type in Gedit.

With the drawspaces Gedit plugin package, you can make white space visible in the form of leading dots, removing any question about levels of indentation.

Take some time to research your favorite text editor. Find out what the editor, or its community, does to make YAML easier, and leverage those features in your work. You won't be sorry.

1. Use a linter

Ideally, programming languages and markup languages use predictable syntax. Computers tend to do well with predictability, so the concept of a linter was invented in 1978. If you're not using a linter for YAML, then it's time to adopt this 40-year-old tradition and use yamllint .

You can install yamllint on Linux using your distribution's package manager. For instance, on Red Hat Enterprise Linux 8 or Fedora :

$ sudo dnf install yamllint

Invoking yamllint is as simple as telling it to check a file. Here's an example of yamllint 's response to a YAML file containing an error:

$ yamllint errorprone.yaml
errorprone.yaml
23:10     error    syntax error: mapping values are not allowed here
23:11     error    trailing spaces  (trailing-spaces)

That's not a time stamp on the left. It's the error's line and column number. You may or may not understand what error it's talking about, but now you know the error's location. Taking a second look at the location often makes the error's nature obvious. Success is eerily silent, so if you want feedback based on the lint's success, you can add a conditional second command with a double-ampersand ( && ). In a POSIX shell, && fails if a command returns anything but 0, so upon success, your echo command makes that clear. This tactic is somewhat superficial, but some users prefer the assurance that the command did run correctly, rather than failing silently. Here's an example:

$ yamllint perfect.yaml && echo "OK"
OK

The reason yamllint is so silent when it succeeds is that it returns 0 errors when there are no errors.

2. Write in Python, not YAML

If you really hate YAML, stop writing in YAML, at least in the literal sense. You might be stuck with YAML because that's the only format an application accepts, but if the only requirement is to end up in YAML, then work in something else and then convert. Python, along with the excellent pyyaml library, makes this easy, and you have two methods to choose from: self-conversion or scripted.

Self-conversion

In the self-conversion method, your data files are also Python scripts that produce YAML. This works best for small data sets. Just write your JSON data into a Python variable, prepend an import statement, and end the file with a simple three-line output statement.

#!/usr/bin/python3	
import yaml 

d={
"glossary": {
  "title": "example glossary",
  "GlossDiv": {
	"title": "S",
	"GlossList": {
	  "GlossEntry": {
		"ID": "SGML",
		"SortAs": "SGML",
		"GlossTerm": "Standard Generalized Markup Language",
		"Acronym": "SGML",
		"Abbrev": "ISO 8879:1986",
		"GlossDef": {
		  "para": "A meta-markup language, used to create markup languages such as DocBook.",
		  "GlossSeeAlso": ["GML", "XML"]
		  },
		"GlossSee": "markup"
		}
	  }
	}
  }
}

f=open('output.yaml','w')
f.write(yaml.dump(d))
f.close

Run the file with Python to produce a file called output.yaml file.

$ python3 ./example.json
$ cat output.yaml
glossary:
  GlossDiv:
	GlossList:
	  GlossEntry:
		Abbrev: ISO 8879:1986
		Acronym: SGML
		GlossDef:
		  GlossSeeAlso: [GML, XML]
		  para: A meta-markup language, used to create markup languages such as DocBook.
		GlossSee: markup
		GlossTerm: Standard Generalized Markup Language
		ID: SGML
		SortAs: SGML
	title: S
  title: example glossary

This output is perfectly valid YAML, although yamllint does issue a warning that the file is not prefaced with --- , which is something you can adjust either in the Python script or manually.

Scripted conversion

In this method, you write in JSON and then run a Python conversion script to produce YAML. This scales better than self-conversion, because it keeps the converter separate from the data.

Create a JSON file and save it as example.json . Here is an example from json.org :

{
	"glossary": {
	  "title": "example glossary",
	  "GlossDiv": {
		"title": "S",
		"GlossList": {
		  "GlossEntry": {
			"ID": "SGML",
			"SortAs": "SGML",
			"GlossTerm": "Standard Generalized Markup Language",
			"Acronym": "SGML",
			"Abbrev": "ISO 8879:1986",
			"GlossDef": {
			  "para": "A meta-markup language, used to create markup languages such as DocBook.",
			  "GlossSeeAlso": ["GML", "XML"]
			  },
			"GlossSee": "markup"
			}
		  }
		}
	  }
	}

Create a simple converter and save it as json2yaml.py . This script imports both the YAML and JSON Python modules, loads a JSON file defined by the user, performs the conversion, and then writes the data to output.yaml .

#!/usr/bin/python3
import yaml
import sys
import json

OUT=open('output.yaml','w')
IN=open(sys.argv[1], 'r')

JSON = json.load(IN)
IN.close()
yaml.dump(JSON, OUT)
OUT.close()

Save this script in your system path, and execute as needed:

$ ~/bin/json2yaml.py example.json
3. Parse early, parse often

Sometimes it helps to look at a problem from a different angle. If your problem is YAML, and you're having a difficult time visualizing the data's relationships, you might find it useful to restructure that data, temporarily, into something you're more familiar with.

If you're more comfortable with dictionary-style lists or JSON, for instance, you can convert YAML to JSON in two commands using an interactive Python shell. Assume your YAML file is called mydata.yaml .

$ python3
>>> f=open('mydata.yaml','r')
>>> yaml.load(f)
{'document': 34843, 'date': datetime.date(2019, 5, 23), 'bill-to': {'given': 'Seth', 'family': 'Kenlon', 'address': {'street': '51b Mornington Road\n', 'city': 'Brooklyn', 'state': 'Wellington', 'postal': 6021, 'country': 'NZ'}}, 'words': 938, 'comments': 'Good article. Could be better.'}

There are many other examples, and there are plenty of online converters and local parsers, so don't hesitate to reformat data when it starts to look more like a laundry list than markup.

4. Read the spec

After I've been away from YAML for a while and find myself using it again, I go straight back to yaml.org to re-read the spec. If you've never read the specification for YAML and you find YAML confusing, a glance at the spec may provide the clarification you never knew you needed. The specification is surprisingly easy to read, with the requirements for valid YAML spelled out with lots of examples in chapter 6 .

5. Pseudo-config

Before I started writing my book, Developing Games on the Raspberry Pi , Apress, 2019, the publisher asked me for an outline. You'd think an outline would be easy. By definition, it's just the titles of chapters and sections, with no real content. And yet, out of the 300 pages published, the hardest part to write was that initial outline.

YAML can be the same way. You may have a notion of the data you need to record, but that doesn't mean you fully understand how it's all related. So before you sit down to write YAML, try doing a pseudo-config instead.

A pseudo-config is like pseudo-code. You don't have to worry about structure or indentation, parent-child relationships, inheritance, or nesting. You just create iterations of data in the way you currently understand it inside your head.

A pseudo-config.

Once you've got your pseudo-config down on paper, study it, and transform your results into valid YAML.

6. Resolve the spaces vs. tabs debate

OK, maybe you won't definitively resolve the spaces-vs-tabs debate , but you should at least resolve the debate within your project or organization. Whether you resolve this question with a post-process sed script, text editor configuration, or a blood-oath to respect your linter's results, anyone in your team who touches a YAML project must agree to use spaces (in accordance with the YAML spec).

Any good text editor allows you to define a number of spaces instead of a tab character, so the choice shouldn't negatively affect fans of the Tab key.

Tabs and spaces are, as you probably know all too well, essentially invisible. And when something is out of sight, it rarely comes to mind until the bitter end, when you've tested and eliminated all of the "obvious" problems. An hour wasted to an errant tab or group of spaces is your signal to create a policy to use one or the other, and then to develop a fail-safe check for compliance (such as a Git hook to enforce linting).

7. Less is more (or more is less)

Some people like to write YAML to emphasize its structure. They indent vigorously to help themselves visualize chunks of data. It's a sort of cheat to mimic markup languages that have explicit delimiters.

Here's a good example from Ansible's documentation :

# Employee records
-  martin:
        name: Martin D'vloper
        job: Developer
        skills:
            - python
            - perl
            - pascal
-  tabitha:
        name: Tabitha Bitumen
        job: Developer
        skills:
            - lisp
            - fortran
            - erlang

For some users, this approach is a helpful way to lay out a YAML document, while other users miss the structure for the void of seemingly gratuitous white space.

If you own and maintain a YAML document, then you get to define what "indentation" means. If blocks of horizontal white space distract you, then use the minimal amount of white space required by the YAML spec. For example, the same YAML from the Ansible documentation can be represented with fewer indents without losing any of its validity or meaning:

---
- martin:
   name: Martin D'vloper
   job: Developer
   skills:
   - python
   - perl
   - pascal
- tabitha:
   name: Tabitha Bitumen
   job: Developer
   skills:
   - lisp
   - fortran
   - erlang
8. Make a recipe

I'm a big fan of repetition breeding familiarity, but sometimes repetition just breeds repeated stupid mistakes. Luckily, a clever peasant woman experienced this very phenomenon back in 396 AD (don't fact-check me), and invented the concept of the recipe .

If you find yourself making YAML document mistakes over and over, you can embed a recipe or template in the YAML file as a commented section. When you're adding a section, copy the commented recipe and overwrite the dummy data with your new real data. For example:

---
# - <common name>:
#   name: Given Surname
#   job: JOB
#   skills:
#   - LANG
- martin:
  name: Martin D'vloper
  job: Developer
  skills:
  - python
  - perl
  - pascal
- tabitha:
  name: Tabitha Bitumen
  job: Developer
  skills:
  - lisp
  - fortran
  - erlang
9. Use something else

I'm a fan of YAML, generally, but sometimes YAML isn't the answer. If you're not locked into YAML by the application you're using, then you might be better served by some other configuration format. Sometimes config files outgrow themselves and are better refactored into simple Lua or Python scripts.

YAML is a great tool and is popular among users for its minimalism and simplicity, but it's not the only tool in your kit. Sometimes it's best to part ways. One of the benefits of YAML is that parsing libraries are common, so as long as you provide migration options, your users should be able to adapt painlessly.

If YAML is a requirement, though, keep these tips in mind and conquer your YAML hatred once and for all! What to read next

[Aug 04, 2019] Ansible IT automation for everybody Enable SysAdmin

Aug 04, 2019 | www.redhat.com

Skip to main content We use cookies on our websites to deliver our online services. Details about how we use cookies and how you may disable them are set out in our Privacy Statement . By using this website you agree to our use of cookies. × Search Enable SysAdmin

Ansible: IT automation for everybody Kick the tires with Ansible and start automating with these simple tasks.

Posted July 31, 2019 | by Jörg Kastning

Image

Ansible is an open source tool for software provisioning, application deployment, orchestration, configuration, and administration. Its purpose is to help you automate your configuration processes and simplify the administration of multiple systems. Thus, Ansible essentially pursues the same goals as Puppet, Chef, or Saltstack.

What I like about Ansible is that it's flexible, lean, and easy to start with. In most use cases, it keeps the job simple.

I chose to use Ansible back in 2016 because no agent has to be installed on the managed nodes -- a node is what Ansible calls a managed remote system. All you need to start managing a remote system with Ansible is SSH access to the system, and Python installed on it. Python is preinstalled on most Linux systems, and I was already used to managing my hosts via SSH, so I was ready to start right away. And if the day comes where I decide not to use Ansible anymore, I just have to delete my Ansible controller machine (control node) and I'm good to go. There are no agents left on the managed nodes that have to be removed.

Ansible offers two ways to control your nodes. The first one uses playbooks . These are simple ASCII files written in Yet Another Markup Language (YAML) , which is easy to read and write. And second, there are the ad-hoc commands , which allow you to run a command or module without having to create a playbook first.

You organize the hosts you would like to manage and control in an inventory file, which offers flexible format options. For example, this could be an INI-like file that looks like:

mail.example.com

[webservers]
foo.example.com
bar.example.com

[dbservers]
one.example.com
two.example.com
three.example.com

[site1:children]
webservers
dbservers
Examples

I would like to give you two small examples of how to use Ansible. I started with these really simple tasks before I used Ansible to take control of more complex tasks in my infrastructure.

Ad-hoc: Check if Ansible can remote manage a system

As you might recall from the beginning of this article, all you need to manage a remote host is SSH access to it, and a working Python interpreter on it. To check if these requirements are fulfilled, run the following ad-hoc command against a host from your inventory:

[jkastning@ansible]$ ansible mail.example.com -m ping
mail.example.com | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
Playbook: Register a system and attach a subscription

This example shows how to use a playbook to keep installed packages up to date. The playbook is an ASCII text file which looks like this:

---
# Make sure all packages are up to date
- name: Update your system
  hosts: mail.example.com
  tasks:
  - name: Make sure all packages are up to date
    yum:
      name: "*"
      state: latest

Now, we are ready to run the playbook:

[jkastning@ansible]$ ansible-playbook yum_update.yml 

PLAY [Update your system] **************************************************************************

TASK [Gathering Facts] *****************************************************************************
ok: [mail.example.com]

TASK [Make sure all packages are up to date] *******************************************************
ok: [mail.example.com]

PLAY RECAP *****************************************************************************************
mail.example.com : ok=2    changed=0    unreachable=0    failed=0

Here everything is ok and there is nothing else to do. All installed packages are already the latest version.

It's simple: Try and use it

The examples above are quite simple and should only give you a first impression. But, from the start, it did not take me long to use Ansible for more complex tasks like the Poor Man's RHEL Mirror or the Ansible Role for RHEL Patchmanagment .

Today, Ansible saves me a lot of time and supports my day-to-day work tasks quite well. So what are you waiting for? Try it, use it, and feel a bit more comfortable at work. What to read next Image 10 YAML tips for people who hate YAML Do you hate YAML? These tips might ease your pain. Posted: June 10, 2019 Author: Seth Kenlon (Red Hat) Topics: Automation Ansible AUTOMATION FOR EVERYONE

Getting started with Ansible Get started Jörg Kastning Joerg is a sysadmin for over ten years now. He is a member of the Red Hat Accelerators and runs his own blog at https://www.my-it-brain.de. More about me Related Content Image 10 YAML tips for people who hate YAML Do you hate YAML? These tips might ease your pain. Posted: June 10, 2019 Author: Seth Kenlon (Red Hat)

OUR BEST CONTENT, DELIVERED TO YOUR INBOX

https://www.redhat.com/sysadmin/eloqua-embedded-subscribe.html?offer_id=701f20000012gE7AAI The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat.

Red Hat and the Red Hat logo are trademarks of Red Hat, Inc., registered in the United States and other countries.

Copyright ©2019 Red Hat, Inc.

Twitter Facebook 0 LinkedIn Reddit 33 Email Twitter Facebook 0 LinkedIn Reddit 33 Email x Subscribe now

Get the highlights in your inbox every week.

https://www.redhat.com/sysadmin/eloqua-embedded-email-capture-block.html?offer_id=701f20000012gE7AAI ✓ Thanks for sharing! Facebook Twitter Email Pinterest LinkedIn Reddit WhatsApp Gmail Telegram Pocket Mix Tumblr Amazon Wish List AOL Mail Balatarin BibSonomy Bitty Browser Blinklist Blogger BlogMarks Bookmarks.fr Box.net Buffer Care2 News CiteULike Copy Link Design Float Diary.Ru Diaspora Digg Diigo Douban Draugiem DZone Evernote Facebook Messenger Fark Flipboard Folkd Google Bookmarks Google Classroom Google+ Hacker News Hatena Houzz Instapaper Kakao Kik Kindle It Known Line LiveJournal Mail.Ru Mastodon Mendeley Meneame Mixi MySpace Netvouz Odnoklassniki Outlook.com Papaly Pinboard Plurk Print PrintFriendly Protopage Bookmarks Pusha Qzone Rediff MyPage Refind Renren Sina Weibo SiteJot Skype Slashdot SMS StockTwits Svejo Symbaloo Bookmarks Threema Trello Tuenti Twiddla TypePad Post Viadeo Viber VK Wanelo WeChat WordPress Wykop XING Yahoo Mail Yoolink Yummly AddToAny Facebook Twitter Email Pinterest LinkedIn Reddit WhatsApp Gmail Email Gmail AOL Mail Outlook.com Yahoo Mail More

https://static.addtoany.com/menu/sm.21.html#type=page&event=load&url=https%3A%2F%2Fwww.redhat.com%2Fsysadmin%2Fansible-it-automation-everybody&referrer=https%3A%2F%2Fwww.linuxtoday.com%2Fit_management%2Fansible-it-automation-for-everybody-190731052032.html

https://redhat.demdex.net/dest5.html?d_nsid=0#https%3A%2F%2Fwww.redhat.com%2Fsysadmin%2Fansible-it-automation-everybody

[Aug 03, 2019] Creating Bootable Linux USB Drive with Etcher

Aug 03, 2019 | linuxize.com

There are several different applications available for free use which will allow you to flash ISO images to USB drives. In this example, we will use Etcher. It is a free and open-source utility for flashing images to SD cards & USB drives and supports Windows, macOS, and Linux.

Head over to the Etcher downloads page , and download the most recent Etcher version for your operating system. Once the file is downloaded, double-click on it and follow the installation wizard.

Creating Bootable Linux USB Drive using Etcher is a relatively straightforward process, just follow the steps outlined below:

  1. Connect the USB flash drive to your system and Launch Etcher.
  2. Click on the Select image button and locate the distribution .iso file.
  3. If only one USB drive is attached to your machine, Etcher will automatically select it. Otherwise, if more than one SD cards or USB drives are connected make sure you have selected the correct USB drive before flashing the image.

[Jul 30, 2019] FreeDOS turns 25 years old by Jim Hall

Jul 28, 2019 | opensource.com

FreeDOS turns 25 years old: An origin story The operating system's history is a great example of the open source software model: developers working together to create something.

Get the highlights in your inbox every week.

FreeDOS .

That's a major milestone for any open source software project, and I'm proud of the work that we've done on it over the past quarter century. I'm also proud of how we built FreeDOS because it is a great example of how the open source software model works.

For its time, MS-DOS was a powerful operating system. I'd used DOS for years, ever since my parents replaced our aging Apple II computer with a newer IBM machine. MS-DOS provided a flexible command line, which I quite liked and that came in handy to manipulate my files. Over the years, I learned how to write my own utilities in C to expand its command-line capabilities even further.

Around 1994, Microsoft announced that its next planned version of Windows would do away with MS-DOS. But I liked DOS. Even though I had started migrating to Linux, I still booted into MS-DOS to run applications that Linux didn't have yet.

I figured that if we wanted to keep DOS, we would need to write our own. And that's how FreeDOS was born.

On June 29, 1994, I made a small announcement about my idea to the comp.os.msdos.apps newsgroup on Usenet.

ANNOUNCEMENT OF PD-DOS PROJECT:
A few months ago, I posted articles relating to starting a public domain version of DOS. The general support for this at the time was strong, and many people agreed with the statement, "start writing!" So, I have

Announcing the first effort to produce a PD-DOS. I have written up a "manifest" describing the goals of such a project and an outline of the work, as well as a "task list" that shows exactly what needs to be written. I'll post those here, and let discussion follow.

While I announced the project as PD-DOS (for "public domain," although the abbreviation was meant to mimic IBM's "PC-DOS"), we soon changed the name to Free-DOS and later FreeDOS.

I started working on it right away. First, I shared the utilities I had written to expand the DOS command line. Many of them reproduced MS-DOS features, including CLS, DATE, DEL, FIND, HELP, and MORE. Some added new features to DOS that I borrowed from Unix, such as TEE and TRCH (a simple implementation of Unix's tr). I contributed over a dozen FreeDOS utilities

By sharing my utilities, I gave other developers a starting point. And by sharing my source code under the GNU General Public License (GNU GPL), I implicitly allowed others to add new features and fix bugs.

Other developers who saw FreeDOS taking shape contacted me and wanted to help. Tim Norman was one of the first; Tim volunteered to write a command shell (COMMAND.COM, later named FreeCOM). Others contributed utilities that replicated or expanded the DOS command line.

We released our first alpha version as soon as possible. Less than three months after announcing FreeDOS, we had an Alpha 1 distribution that collected our utilities. By the time we released Alpha 5, FreeDOS boasted over 60 utilities. And FreeDOS included features never imagined in MS-DOS, including internet connectivity via a PPP dial-up driver and dual-monitor support using a primary VGA monitor and a secondary Hercules Mono monitor.

New developers joined the project, and we welcomed them. By October 1998, FreeDOS had a working kernel, thanks to Pat Villani. FreeDOS also sported a host of new features that brought not just parity with MS-DOS but surpassed MS-DOS, including ANSI support and a print spooler that resembled Unix lpr.

You may be familiar with other milestones. We crept our way towards the 1.0 label, finally releasing FreeDOS 1.0 in September 2006, FreeDOS 1.1 in January 2012, and FreeDOS 1.2 in December 2016. MS-DOS stopped being a moving target long ago, so we didn't need to update as frequently after the 1.0 release.

Today, FreeDOS is a very modern DOS. We've moved beyond "classic DOS," and now FreeDOS features lots of development tools such as compilers, assemblers, and debuggers. We have lots of editors beyond the plain DOS Edit editor, including Fed, Pico, TDE, and versions of Emacs and Vi. FreeDOS supports networking and even provides a simple graphical web browser (Dillo). And we have tons of new utilities, including many that will make Linux users feel at home.

FreeDOS got where it is because developers worked together to create something. In the spirit of open source software, we contributed to each other's work by fixing bugs and adding new features. We treated our users as co-developers; we always found ways to include people, whether they were writing code or writing documentation. And we made decisions through consensus based on merit. If that sounds familiar, it's because those are the core values of open source software: transparency, collaboration, release early and often, meritocracy, and community. That's the open source way !

I encourage you to download FreeDOS 1.2 and give it a try.

[Jul 26, 2019] How To Check Swap Usage Size and Utilization in Linux by Vivek Gite

Jul 26, 2019 | www.cyberciti.biz

The procedure to check swap space usage and size in Linux is as follows:

  1. Open a terminal application.
  2. To see swap size in Linux, type the command: swapon -s .
  3. You can also refer to the /proc/swaps file to see swap areas in use on Linux.
  4. Type free -m to see both your ram and your swap space usage in Linux.
  5. Finally, one can use the top or htop command to look for swap space Utilization on Linux too.
How to Check Swap Space in Linux using /proc/swaps file

Type the following cat command to see total and used swap size:
# cat /proc/swaps
Sample outputs:

Filename                           Type            Size    Used    Priority
/dev/sda3                               partition       6291448 65680   0

Another option is to type the grep command as follows:
grep Swap /proc/meminfo

SwapCached:            0 kB
SwapTotal:        524284 kB
SwapFree:         524284 kB
Look for swap space in Linux using swapon command

Type the following command to show swap usage summary by device
# swapon -s
Sample outputs:

Filename                           Type            Size    Used    Priority
/dev/sda3                               partition       6291448 65680   0
Use free command to monitor swap space usage

Use the free command as follows:
# free -g
# free -k
# free -m

Sample outputs:

             total       used       free     shared    buffers     cached
Mem:         11909      11645        264          0        324       8980
-/+ buffers/cache:       2341       9568
Swap:         6143         64       6079
See swap size in Linux using vmstat command

Type the following vmstat command:
# vmstat
# vmstat 1 5

... ... ...

Vivek Gite is the creator of nixCraft and a seasoned sysadmin, DevOps engineer, and a trainer for the Linux operating system/Unix shell scripting. Get the latest tutorials on SysAdmin, Linux/Unix and open source topics via RSS/XML feed or weekly email newsletter.

[Jul 26, 2019] The day the virtual machine manager died by Nathan Lager

"Dangerous" commands like dd should probably be always typed first in the editor and only when you verity that you did not make a blunder , executed...
A good decision was to go home and think the situation over, not to aggravate it with impulsive attempts to correct the situation, which typically only make it worse.
Lack of checking of the health of backups suggest that this guy is an arrogant sucker, despite his 20 years of sysadmin experience.
Notable quotes:
"... I started dd as root , over the top of an EXISTING DISK ON A RUNNING VM. What kind of idiot does that?! ..."
"... Since my VMs were still running, and I'd already done enough damage for one night, I stopped touching things and went home. ..."
Jul 26, 2019 | www.redhat.com

... ... ...

See, my RHEV manager was a VM running on a stand-alone Kernel-based Virtual Machine (KVM) host, separate from the cluster it manages. I had been running RHEV since version 3.0, before hosted engines were a thing, and I hadn't gone through the effort of migrating. I was already in the process of building a new set of clusters with a new manager, but this older manager was still controlling most of our production VMs. It had filled its disk again, and the underlying database had stopped itself to avoid corruption.

See, for whatever reason, we had never set up disk space monitoring on this system. It's not like it was an important box, right?

So, I logged into the KVM host that ran the VM, and started the well-known procedure of creating a new empty disk file, and then attaching it via virsh . The procedure goes something like this: Become root , use dd to write a stream of zeros to a new file, of the proper size, in the proper location, then use virsh to attach the new disk to the already running VM. Then, of course, log into the VM and do your disk expansion.

I logged in, ran sudo -i , and started my work. I ran cd /var/lib/libvirt/images , ran ls -l to find the existing disk images, and then started carefully crafting my dd command:

dd ... bs=1k count=40000000 if=/dev/zero ... of=./vmname-disk ...

Which was the next disk again? <Tab> of=vmname-disk2.img <Back arrow, Back arrow, Back arrow, Back arrow, Backspace> Don't want to dd over the existing disk, that'd be bad. Let's change that 2 to a 3 , and Enter . OH CRAP, I CHANGED THE 2 TO A 2 NOT A 3 ! <Ctrl+C><Ctrl+C><Ctrl+C><Ctrl+C><Ctrl+C><Ctrl+C>

I still get sick thinking about this. I'd done the stupidest thing I possibly could have done, I started dd as root , over the top of an EXISTING DISK ON A RUNNING VM. What kind of idiot does that?! (The kind that's at work late, trying to get this one little thing done before he heads off to see his friend. The kind that thinks he knows better, and thought he was careful enough to not make such a newbie mistake. Gah.)

So, how fast does dd start writing zeros? Faster than I can move my fingers from the Enter key to the Ctrl+C keys. I tried a number of things to recover the running disk from memory, but all I did was make things worse, I think. The system was still up, but still broken like it was before I touched it, so it was useless.

Since my VMs were still running, and I'd already done enough damage for one night, I stopped touching things and went home. The next day I owned up to the boss and co-workers pretty much the moment I walked in the door. We started taking an inventory of what we had, and what was lost. I had taken the precaution of setting up backups ages ago. So, we thought we had that to fall back on.

I opened a ticket with Red Hat support and filled them in on how dumb I'd been. I can only imagine the reaction of the support person when they read my ticket. I worked a help desk for years, I know how this usually goes. They probably gathered their closest coworkers to mourn for my loss, or get some entertainment out of the guy who'd been so foolish. (I say this in jest. Red Hat's support was awesome through this whole ordeal, and I'll tell you how soon. )

So, I figured the next thing I would need from my broken server, which was still running, was the backups I'd diligently been collecting. They were on the VM but on a separate virtual disk, so I figured they were safe. The disk I'd overwritten was the last disk I'd made to expand the volume the database was on, so that logical volume was toast, but I've always set up my servers such that the main mounts -- / , /var , /home , /tmp , and /root -- were all separate logical volumes.

In this case, /backup was an entirely separate virtual disk. So, I scp -r 'd the entire /backup mount to my laptop. It copied, and I felt a little sigh of relief. All of my production systems were still running, and I had my backup. My hope was that these factors would mean a relatively simple recovery: Build a new VM, install RHEV-M, and restore my backup. Simple right?

By now, my boss had involved the rest of the directors, and let them know that we were looking down the barrel of a possibly bad time. We started organizing a team meeting to discuss how we were going to get through this. I returned to my desk and looked through the backups I had copied from the broken server. All the files were there, but they were tiny. Like, a couple hundred kilobytes each, instead of the hundreds of megabytes or even gigabytes that they should have been.

Happy feeling, gone.

Turns out, my backups were running, but at some point after an RHEV upgrade, the database backup utility had changed. Remember how I said this system had existed since version 3.0? Well, 3.0 didn't have an engine-backup utility, so in my RHEV training, we'd learned how to make our own. Mine broke when the tools changed, and for who knows how long, it had been getting an incomplete backup -- just some files from /etc .

No database. Ohhhh ... Fudge. (I didn't say "Fudge.")

I updated my support case with the bad news and started wondering what it would take to break through one of these 4th-floor windows right next to my desk. (Ok, not really.)

At this point, we basically had three RHEV clusters with no manager. One of those was for development work, but the other two were all production. We started using these team meetings to discuss how to recover from this mess. I don't know what the rest of my team was thinking about me, but I can say that everyone was surprisingly supportive and un-accusatory. I mean, with one typo I'd thrown off the entire department. Projects were put on hold and workflows were disrupted, but at least we had time: We couldn't reboot machines, we couldn't change configurations, and couldn't get to VM consoles, but at least everything was still up and operating.

Red Hat support had escalated my SNAFU to an RHEV engineer, a guy I'd worked with in the past. I don't know if he remembered me, but I remembered him, and he came through yet again. About a week in, for some unknown reason (we never figured out why), our Windows VMs started dropping offline. They were still running as far as we could tell, but they dropped off the network, Just boom. Offline. In the course of a workday, we lost about a dozen windows systems. All of our RHEL machines were working fine, so it was just some Windows machines, and not even every Windows machine -- about a dozen of them.

Well great, how could this get worse? Oh right, add a ticking time bomb. Why were the Windows servers dropping off? Would they all eventually drop off? Would the RHEL systems eventually drop off? I made a panicked call back to support, emailed my account rep, and called in every favor I'd ever collected from contacts I had within Red Hat to get help as quickly as possible.

I ended up on a conference call with two support engineers, and we got to work. After about 30 minutes on the phone, we'd worked out the most insane recovery method. We had the newer RHEV manager I mentioned earlier, that was still up and running, and had two new clusters attached to it. Our recovery goal was to get all of our workloads moved from the broken clusters to these two new clusters.

Want to know how we ended up doing it? Well, as our Windows VMs were dropping like flies, the engineers and I came up with this plan. My clusters used a Fibre Channel Storage Area Network (SAN) as their storage domains. We took a machine that was not in use, but had a Fibre Channel host bus adapter (HBA) in it, and attached the logical unit numbers (LUNs) for both the old cluster's storage domains and the new cluster's storage domains to it. The plan there was to make a new VM on the new clusters, attach blank disks of the proper size to the new VM, and then use dd (the irony is not lost on me) to block-for-block copy the old broken VM disk over to the newly created empty VM disk.

I don't know if you've ever delved deeply into an RHEV storage domain, but under the covers it's all Logical Volume Manager (LVM). The problem is, the LV's aren't human-readable. They're just universally-unique identifiers (UUIDs) that the RHEV manager's database links from VM to disk. These VMs are running, but we don't have the database to reference. So how do you get this data?

virsh ...

Luckily, I managed KVM and Xen clusters long before RHEV was a thing that was viable. I was no stranger to libvirt 's virsh utility. With the proper authentication -- which the engineers gave to me -- I was able to virsh dumpxml on a source VM while it was running, get all the info I needed about its memory, disk, CPUs, and even MAC address, and then create an empty clone of it on the new clusters.

Once I felt everything was perfect, I would shut down the VM on the broken cluster with either virsh shutdown , or by logging into the VM and shutting it down. The catch here is that if I missed something and shut down that VM, there was no way I'd be able to power it back on. Once the data was no longer in memory, the config would be completely lost, since that information is all in the database -- and I'd hosed that. Once I had everything, I'd log into my migration host (the one that was connected to both storage domains) and use dd to copy, bit-for-bit, the source storage domain disk over to the destination storage domain disk. Talk about nerve-wracking, but it worked! We picked one of the broken windows VMs and followed this process, and within about half an hour we'd completed all of the steps and brought it back online.

We did hit one snag, though. See, we'd used snapshots here and there. RHEV snapshots are lvm snapshots. Consolidating them without the RHEV manager was a bit of a chore, and took even more leg work and research before we could dd the disks. I had to mimic the snapshot tree by creating symbolic links in the right places, and then start the dd process. I worked that one out late that evening after the engineers were off, probably enjoying time with their families. They asked me to write the process up in detail later. I suspect that it turned into some internal Red Hat documentation, never to be given to a customer because of the chance of royally hosing your storage domain.

Somehow, over the course of 3 months and probably a dozen scheduled maintenance windows, I managed to migrate every single VM (of about 100 VMs) from the old zombie clusters to the working clusters. This migration included our Zimbra collaboration system (10 VMs in itself), our file servers (another dozen VMs), our Enterprise Resource Planning (ERP) platform, and even Oracle databases.

We didn't lose a single VM and had no more unplanned outages. The Red Hat Enterprise Linux (RHEL) systems, and even some Windows systems, never fell to the mysterious drop-off that those dozen or so Windows servers did early on. During this ordeal, though, I had trouble sleeping. I was stressed out and felt so guilty for creating all this work for my co-workers, I even had trouble eating. No exaggeration, I lost 10lbs.

So, don't be like Nate. Monitor your important systems, check your backups, and for all that's holy, double-check your dd output file. That way, you won't have drama, and can truly enjoy Sysadmin Appreciation Day!

Nathan Lager is an experienced sysadmin, with 20 years in the industry. He runs his own blog at undrground.org, and hosts the Iron Sysadmin Podcast. More about me

[Jul 13, 2019] Find lost files with Scalpel - Enable SysAdmin

Jul 13, 2019 | www.redhat.com

As a system administrator, part of your responsibility is to help users manage their data. One of the vital aspects of doing that is to ensure your organization has a good backup plan, and that your users either make their backups regularly, or else don't have to because you've automated the process.

However, sometimes the worst happens. A file gets deleted by mistake, a filesystem becomes corrupt, or a partition gets lost, and for whatever reason, the backups don't contain what you need.

As we discussed in How to prevent and recover from accidental file deletion in Linux , before trying to recover lost data, you must find out why the data is missing in the first place. It's possible that a user has simply misplaced the file, or that there is a backup that the user isn't aware of. But if a user has indeed removed a file with no backups, then you know you need to recover a deleted file. If a partition table has become scrambled, though, then the files aren't really lost at all, and you might want to consider using TestDisk to recover the partition table, or the partition itself.

What happens if your file or partition recovery isn't successful, or is only in part? Then it's time for Scalpel . Scalpel performs file carving operations based on patterns describing unique file types. It looks for these patterns based on binary strings and regular expressions, and then extracts the file accordingly.

This tool isn't currently being maintained, but it's ever-reliable, compiling and running exactly as expected. If you're running Red Hat Enterprise Linux (RHEL) 7, RHEL 8, or Fedora , you can download Scalpel's RPM installers, along with its dependency, libtre , from klaatu.fedorapeople.org .

Starting with Scalpel

Scalpel comes bundled with a comprehensive list of file types and their most unique identifying features. Sometimes, a file can be identified by predictable text at its head and tail:

htm    n    50000   <html         </html>

While at other times, cryptic-looking hex codes are necessary:

jpg    y   200000000    \xff\xd8\xff\xe0\x00\x10    \xff\xd9

Scalpel expects you to duplicate /etc/scalpel.conf edit your copy to include the file types you hope to recover, and to exclude the file types you know you don't need. For instance, if you know you don't have or care about .fws files, then comment that line out of the file. Doing this can speed up the recovery process and reduce false positives.

In the configuration file, the format of a file definition is, from left to right:

The footer field is optional. If no footer is provided, then Scalpel extracts the number of bytes you set as the file type's maximum value.

You might find that a recovery effort only rescues part of a file, such as this mostly-recovered JPG:

This result means that you probably need to increase the file's bounds maximum value, and then re-scan, so that the end of the file can be recovered, too:

Defining new file types

First, make a copy of the Scalpel configuration file. If all your users generate similar data, then you may only need one config file for your entire organization. Or, you might find it better to have one config file per department.

To add your own file types to a Scalpel config, start with some investigative forensics.

For text files, you ideally have some predictable structure you can anticipate. For instance, an XML file probably starts with <xml and ends with </xml . Binary files are similarly predictable. Using the hexdump command, you can view a typical header from the file type you want to define. Here's the results for an XCF, the default layered graphic file from GIMP:

$ head --bytes 8 example.xcf | hexdump --canonical
00000000  67 69 6d 70 20 78 63 66         |gimp xcf|
00000008

This output is from a Red Hat Enterprise Linux 8 system. On older systems, an older syntax may be necessary:

$ head --bytes 8 example.xcf | hexdump -C
00000000  67 69 6d 70 20 78 63 66         |gimp xcf|
00000008

The canonical output of hexdump displays the address in the far left column, and the decoded values on the far right. In the center column are the hexadecimal bytes of the first 8 bytes of the XCF file's first line.

Most binary files in /etc/scalpel.conf look pretty similar to that output, except that these values are prefaced with the \x escape sequence to denote that the numbers are actually hexadecimal digits. For instance, a JPG file looks like this in the configuration file:

jpg     y     200000000     \xff\xd8\xff\xe0\x00\x10     \xff\xd9

Compare that value with a test hexdump of the first 6 bytes (because that's how many bytes scalpel.conf contains in its JPG definition) of any JPG file on your system:

$ head --bytes 6 example.jpg | | hexdump --canonical
00000000  ff d8 ff e0 00 10                    |......|
00000006

Compare the footer with the last 2 bytes to match what the config file shows:

$ tail --bytes -2 example.jpg | hexdump --canonical
00000000  ff d9                        |..|
00000002

These values match up, so you can be confident that valid JPG files probably all start and end in a predictable sequence.

Note: The Ogg entry in the scalpel.conf file is misleading, as it lacks the \x escape sequence. If you need to recover an Ogg file, fix this, or replace its definition.

Getting to work

Now, to obtain the same level of confidence for all files you need to recover (such as XCF, in the previous example). To reiterate, this is your workflow for defining the binary file types common to the victim drive:

  1. Get the hexadecimal values of the first few bytes of a file type using the head --bytes n command.
  2. Get the last few bytes using the tail --bytes -n command.
  3. Repeat this process on several different files of the same type to confirm consistency of this pattern, adjusting the length of your header and footer patterns as required.
  4. Enter the header and footer values into your custom Scalpel config, using the \x notation to identify each byte as a hexadecimal character.

Follow this sequence for each important binary file type you need to recover.

If a file is plaintext, provide a common header and footer, such as #!/bin/sh for shell scripts, # (the space after the # is important) for markdown files with an h1 level title, <xml for XML files, and so on.

When you're ready to run Scalpel, create a directory where it can place your rescued files:

$ mkdir /run/media/seth/rescuer/scalped

Note: Do not create this directory on the same volume that contains the lost data.

If the victim drive is not yet mounted, mount it, and then run Scalpel:

$ scalpel -c my-scalpel.conf \
  -o /run/media/seth/rescuer/scalped \
  /run/media/seth/victim

You can also run Scalpel on a disk image:

$ scalpel -c my-scalpel.conf \
  -o ~/scalped ~/victim.img

When Scalpel is done, review the files in your designated rescue directory.

All in all, it's best to make backups so you can avoid doing file recovery at all. But, should the worst happen, try Scalpel and carve carefully.

[Jun 23, 2019] Utilizing multi core for tar+gzip-bzip compression-decompression

Highly recommended!
Notable quotes:
"... There is effectively no CPU time spent tarring, so it wouldn't help much. The tar format is just a copy of the input file with header blocks in between files. ..."
"... You can also use the tar flag "--use-compress-program=" to tell tar what compression program to use. ..."
Jun 23, 2019 | stackoverflow.com

user1118764 , Sep 7, 2012 at 6:58

I normally compress using tar zcvf and decompress using tar zxvf (using gzip due to habit).

I've recently gotten a quad core CPU with hyperthreading, so I have 8 logical cores, and I notice that many of the cores are unused during compression/decompression.

Is there any way I can utilize the unused cores to make it faster?

Warren Severin , Nov 13, 2017 at 4:37

The solution proposed by Xiong Chiamiov above works beautifully. I had just backed up my laptop with .tar.bz2 and it took 132 minutes using only one cpu thread. Then I compiled and installed tar from source: gnu.org/software/tar I included the options mentioned in the configure step: ./configure --with-gzip=pigz --with-bzip2=lbzip2 --with-lzip=plzip I ran the backup again and it took only 32 minutes. That's better than 4X improvement! I watched the system monitor and it kept all 4 cpus (8 threads) flatlined at 100% the whole time. THAT is the best solution. – Warren Severin Nov 13 '17 at 4:37

Mark Adler , Sep 7, 2012 at 14:48

You can use pigz instead of gzip, which does gzip compression on multiple cores. Instead of using the -z option, you would pipe it through pigz:
tar cf - paths-to-archive | pigz > archive.tar.gz

By default, pigz uses the number of available cores, or eight if it could not query that. You can ask for more with -p n, e.g. -p 32. pigz has the same options as gzip, so you can request better compression with -9. E.g.

tar cf - paths-to-archive | pigz -9 -p 32 > archive.tar.gz

user788171 , Feb 20, 2013 at 12:43

How do you use pigz to decompress in the same fashion? Or does it only work for compression?

Mark Adler , Feb 20, 2013 at 16:18

pigz does use multiple cores for decompression, but only with limited improvement over a single core. The deflate format does not lend itself to parallel decompression.

The decompression portion must be done serially. The other cores for pigz decompression are used for reading, writing, and calculating the CRC. When compressing on the other hand, pigz gets close to a factor of n improvement with n cores.

Garrett , Mar 1, 2014 at 7:26

The hyphen here is stdout (see this page ).

Mark Adler , Jul 2, 2014 at 21:29

Yes. 100% compatible in both directions.

Mark Adler , Apr 23, 2015 at 5:23

There is effectively no CPU time spent tarring, so it wouldn't help much. The tar format is just a copy of the input file with header blocks in between files.

Jen , Jun 14, 2013 at 14:34

You can also use the tar flag "--use-compress-program=" to tell tar what compression program to use.

For example use:

tar -c --use-compress-program=pigz -f tar.file dir_to_zip

Valerio Schiavoni , Aug 5, 2014 at 22:38

Unfortunately by doing so the concurrent feature of pigz is lost. You can see for yourself by executing that command and monitoring the load on each of the cores. – Valerio Schiavoni Aug 5 '14 at 22:38

bovender , Sep 18, 2015 at 10:14

@ValerioSchiavoni: Not here, I get full load on all 4 cores (Ubuntu 15.04 'Vivid'). – bovender Sep 18 '15 at 10:14

Valerio Schiavoni , Sep 28, 2015 at 23:41

On compress or on decompress ? – Valerio Schiavoni Sep 28 '15 at 23:41

Offenso , Jan 11, 2017 at 17:26

I prefer tar - dir_to_zip | pv | pigz > tar.file pv helps me estimate, you can skip it. But still it easier to write and remember. – Offenso Jan 11 '17 at 17:26

Maxim Suslov , Dec 18, 2014 at 7:31

Common approach

There is option for tar program:

-I, --use-compress-program PROG
      filter through PROG (must accept -d)

You can use multithread version of archiver or compressor utility.

Most popular multithread archivers are pigz (instead of gzip) and pbzip2 (instead of bzip2). For instance:

$ tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 paths_to_archive
$ tar --use-compress-program=pigz -cf OUTPUT_FILE.tar.gz paths_to_archive

Archiver must accept -d. If your replacement utility hasn't this parameter and/or you need specify additional parameters, then use pipes (add parameters if necessary):

$ tar cf - paths_to_archive | pbzip2 > OUTPUT_FILE.tar.gz
$ tar cf - paths_to_archive | pigz > OUTPUT_FILE.tar.gz

Input and output of singlethread and multithread are compatible. You can compress using multithread version and decompress using singlethread version and vice versa.

p7zip

For p7zip for compression you need a small shell script like the following:

#!/bin/sh
case $1 in
  -d) 7za -txz -si -so e;;
   *) 7za -txz -si -so a .;;
esac 2>/dev/null

Save it as 7zhelper.sh. Here the example of usage:

$ tar -I 7zhelper.sh -cf OUTPUT_FILE.tar.7z paths_to_archive
$ tar -I 7zhelper.sh -xf OUTPUT_FILE.tar.7z
xz

Regarding multithreaded XZ support. If you are running version 5.2.0 or above of XZ Utils, you can utilize multiple cores for compression by setting -T or --threads to an appropriate value via the environmental variable XZ_DEFAULTS (e.g. XZ_DEFAULTS="-T 0" ).

This is a fragment of man for 5.1.0alpha version:

Multithreaded compression and decompression are not implemented yet, so this option has no effect for now.

However this will not work for decompression of files that haven't also been compressed with threading enabled. From man for version 5.2.2:

Threaded decompression hasn't been implemented yet. It will only work on files that contain multiple blocks with size information in block headers. All files compressed in multi-threaded mode meet this condition, but files compressed in single-threaded mode don't even if --block-size=size is used.

Recompiling with replacement

If you build tar from sources, then you can recompile with parameters

--with-gzip=pigz
--with-bzip2=lbzip2
--with-lzip=plzip

After recompiling tar with these options you can check the output of tar's help:

$ tar --help | grep "lbzip2\|plzip\|pigz"
  -j, --bzip2                filter the archive through lbzip2
      --lzip                 filter the archive through plzip
  -z, --gzip, --gunzip, --ungzip   filter the archive through pigz

mpibzip2 , Apr 28, 2015 at 20:57

I just found pbzip2 and mpibzip2 . mpibzip2 looks very promising for clusters or if you have a laptop and a multicore desktop computer for instance. – user1985657 Apr 28 '15 at 20:57

oᴉɹǝɥɔ , Jun 10, 2015 at 17:39

Processing STDIN may in fact be slower. – oᴉɹǝɥɔ Jun 10 '15 at 17:39

selurvedu , May 26, 2016 at 22:13

Plus 1 for xz option. It the simplest, yet effective approach. – selurvedu May 26 '16 at 22:13

panticz.de , Sep 1, 2014 at 15:02

You can use the shortcut -I for tar's --use-compress-program switch, and invoke pbzip2 for bzip2 compression on multiple cores:
tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 DIRECTORY_TO_COMPRESS/

einpoklum , Feb 11, 2017 at 15:59

A nice TL;DR for @MaximSuslov's answer . – einpoklum Feb 11 '17 at 15:59
If you want to have more flexibility with filenames and compression options, you can use:
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec \
tar -P --transform='s@/my/path/@@g' -cf - {} + | \
pigz -9 -p 4 > myarchive.tar.gz
Step 1: find

find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec

This command will look for the files you want to archive, in this case /my/path/*.sql and /my/path/*.log . Add as many -o -name "pattern" as you want.

-exec will execute the next command using the results of find : tar

Step 2: tar

tar -P --transform='s@/my/path/@@g' -cf - {} +

--transform is a simple string replacement parameter. It will strip the path of the files from the archive so the tarball's root becomes the current directory when extracting. Note that you can't use -C option to change directory as you'll lose benefits of find : all files of the directory would be included.

-P tells tar to use absolute paths, so it doesn't trigger the warning "Removing leading `/' from member names". Leading '/' with be removed by --transform anyway.

-cf - tells tar to use the tarball name we'll specify later

{} + uses everyfiles that find found previously

Step 3: pigz

pigz -9 -p 4

Use as many parameters as you want. In this case -9 is the compression level and -p 4 is the number of cores dedicated to compression. If you run this on a heavy loaded webserver, you probably don't want to use all available cores.

Step 4: archive name

> myarchive.tar.gz

Finally.

[Jun 23, 2019] Relationship between Fedora version and CentOS version

Notable quotes:
"... F19 -> RHEL7 ..."
Jun 23, 2019 | ask.fedoraproject.org

CentOS is based on RHEL and the relationships between Fedora and RHEL can be found in the Wikipedia article about RHEL

In short, there is no "rule"

That link should work:

http://en.wikipedia.org/wiki/Red_Hat_Enterprise_Linux#Relationship_to_free_and_community_distributions

jmt ( Oct 25 '13 )

Updated link: https://en.wikipedia.org/wiki/Red_Hat... GregMartyn gravatar image GregMartyn ( Mar 9 '16 )

[Jun 22, 2019] Using SSH X session forwarding by Seth Kenlon

Jun 22, 2019 | www.redhat.com

Normally, you would forward a remote computer's X11 graphical display to your local computer with the -X option, but the OpenSSH application places additional security limits on such connections as a precaution. As long as you're starting a shell on a trusted machine, you can use the -Y option to opt out of the excess security:

$ ssh -Y 93.184.216.34

Now you can launch an instance of any one of the remote computer's applications, but have it appear on your screen. For instance, try launching the Nautilus file manager:

remote$ nautilus &

The result is a Nautilus file manager window on your screen, displaying files on the remote computer. Your user can't see the window you're seeing, but at least you have graphical access to what they are using. Through this, you can debug, modify settings, or perform actions that are otherwise unavailable through a normal text-based SSH session.

Keep in mind, though, that a forwarded X11 session does not bring the whole remote session to you. You don't have access to the target computer's audio playback, for example, though you can make the remote system play audio through its speakers. You also can't access any custom application themes on the target computer, and so on (at least, not without some skillful redirection of environment variables).

However, if you only need to view files or use an application that you don't have access to locally, forwarding X can be invaluable.

[Jun 22, 2019] Using SSH and Tmux for screen sharing Enable by Seth Kenlon Tmux

Jun 22, 2019 | www.redhat.com

Tmux is a screen multiplexer, meaning that it provides your terminal with virtual terminals, allowing you to switch from one virtual session to another. Modern terminal emulators feature a tabbed UI, making the use of Tmux seem redundant, but Tmux has a few peculiar features that still prove difficult to match without it.

First of all, you can launch Tmux on a remote machine, start a process running, detach from Tmux, and then log out. In a normal terminal, logging out would end the processes you started. Since those processes were started in Tmux, they persist even after you leave.

Secondly, Tmux can "mirror" its session on multiple screens. If two users log into the same Tmux session, then they both see the same output on their screens in real time.

Tmux is a lightweight, simple, and effective solution in cases where you're training someone remotely, debugging a command that isn't working for them, reviewing text, monitoring services or processes, or just avoiding the ten minutes it sometimes takes to read commands aloud over a phone clearly enough that your user is able to accurately type them.

To try this option out, you must have two computers. Assume one computer is owned by Alice, and the other by Bob. Alice remotely logs into Bob's PC and launches a Tmux session:

alice$ ssh bob.local
alice$ tmux

On his PC, Bob starts Tmux, attaching to the same session:

bob$ tmux attach

When Alice types, Bob sees what she is typing, and when Bob types, Alice sees what he's typing.

It's a simple but effective trick that enables interactive live sessions between computer users, but it is entirely text-based.

Collaboration

With these two applications, you have access to some powerful methods of supporting users. You can use these tools to manage systems remotely, as training tools, or as support tools, and in every case, it sure beats wandering around the office looking for somebody's desk. Get familiar with SSH and Tmux, and start using them today.

[Jun 17, 2019] Accessing remote desktops by Seth Kenlon

Jun 17, 2019 | www.redhat.com

Accessing remote desktops Need to see what's happening on someone else's screen? Here's what you need to know about accessing remote desktops.

Posted June 13, 2019 | by Seth Kenlon (Red Hat) Anyone who's worked a support desk has had the experience: sometimes, no matter how descriptive your instructions, and no matter how concise your commands, it's just easier and quicker for everyone involved to share screens. Likewise, anyone who's ever maintained a server located in a loud and chilly data center -- or across town, or the world -- knows that often a remote viewer is the easiest method for viewing distant screens.

Linux is famously capable of being managed without seeing a GUI, but that doesn't mean you have to manage your box that way. If you need to see the desktop of a computer that you're not physically in front of, there are plenty of tools for the job.

Barriers

Half the battle of successfully screen sharing is getting into the target computer. That's by design, of course. It should be difficult to get into a computer without explicit consent.

Usually, there are up to 3 blockades for accessing a remote machine:

  1. The network firewall
  2. The target computer's firewall
  3. Screen share settings

Specific instruction on how to get past each barrier is impossible. Every network and every computer is configured uniquely, but here are some possible solutions.

Barrier 1: The network firewall

A network firewall is the target computer's LAN entry point, often a part of the router (whether an appliance from an Internet Service Provider or a dedicated server in a rack). In order to pass through the firewall and access a computer remotely, your network firewall must be configured so that the appropriate port for the remote desktop protocol you're using is accessible.

The most common, and most universal, protocol for screen sharing is VNC.

If the network firewall is on a Linux server you can access, you can broadly allow VNC traffic to pass through using firewall-cmd , first by getting your active zone, and then by allowing VNC traffic in that zone:

$ sudo firewall-cmd --get-active-zones
example-zone
  interfaces: enp0s31f6
$ sudo firewall-cmd --add-service=vnc-server --zone=example-zone

If you're not comfortable allowing all VNC traffic into the network, add a rich rule to firewalld in order to let in VNC traffic from only your IP address. For example, using an example IP address of 93.184.216.34, a rule to allow VNC traffic is:

$ sudo firewall-cmd \
--add-rich-rule='rule family="ipv4" source address="93.184.216.34" service name=vnc-server accept'

To ensure the firewall changes were made, reload the rules:

$ sudo firewall-cmd --reload

If network reconfiguration isn't possible, see the section "Screen sharing through a browser."

Barrier 2: The computer's firewall

Most personal computers have built-in firewalls. Users who are mindful of security may actively manage their firewall. Others, though, blissfully trust their default settings. This means that when you're trying to access their computer for screen sharing, their firewall may block incoming remote connection requests without the user even realizing it. Your request to view their screen may successfully pass through the network firewall only to be silently dropped by the target computer's firewall.

Changing zones in Linux.

To remedy this problem, have the user either lower their firewall or, on Fedora and RHEL, place their computer into the trusted zone. Do this only for the duration of the screen sharing session. Alternatively, have them add either one of the rules you added to the network firewall (if your user is on Linux).

A reboot is a simple way to ensure the new firewall setting is instantiated, so that's probably the easiest next step for your user. Power users can instead reload the firewall rules manually :

$ sudo firewall-cmd --reload

If you have a user override their computer's default firewall, remember to close the session by instructing them to re-enable the default firewall zone. Don't leave the door open behind you!

Barrier 3: The computer's screen share settings

To share another computer's screen, the target computer must be running remote desktop software (technically, a remote desktop server , since this software listens to incoming requests). Otherwise, you have nothing to connect to.

Some desktops, like GNOME, provide screen sharing options, which means you don't have to launch a separate screen sharing application. To activate screen sharing in GNOME, open Settings and select Sharing from the left column. In the Sharing panel, click on Screen Sharing and toggle it on:

Remote desktop viewers

There are a number of remote desktop viewers out there. Here are some of the best options.

GNOME Remote Desktop Viewer

The GNOME Remote Desktop Viewer application is codenamed Vinagre . It's a simple application that supports multiple protocols, including VNC, Spice, RDP, and SSH. Vinagre's interface is intuitive, and yet this application offers many options, including whether you want to control the target computer or only view it.

If Vinagre's not already installed, use your distribution's package manager to add it. On Red Hat Enterprise Linux and Fedora , use:

$ sudo dnf install vinagre

In order to open Vinagre, go to the GNOME desktop's Activities menu and launch Remote Desktop Viewer . Once it opens, click the Connect button in the top left corner. In the Connect window that appears, select the VNC protocol. In the Host field, enter the IP address of the computer you're connecting to. If you want to use the computer's hostname instead, you must have a valid DNS service in place, or Avahi , or entries in /etc/hosts . Do not prepend your entry with a username.

Select any additional options you prefer, and then click Connect .

If you use the GNOME Remote Desktop Viewer as a full-screen application, move your mouse to the screen's top center to reveal additional controls. Most importantly, the exit fullscreen button.

If you're connecting to a Linux virtual machine, you can use the Spice protocol instead. Spice is robust, lightweight, and transmits both audio and video, usually with no noticeable lag.

TigerVNC and TightVNC

Sometimes you're not on a Linux machine, so the GNOME Remote Desktop Viewer isn't available. As usual, open source has an answer. In fact, open source has several answers, but two popular ones are TigerVNC and TightVNC , which are both cross-platform VNC viewers. TigerVNC offers separate downloads for each platform, while TightVNC has a universal Java client.

Both of these clients are simple, with additional options included in case you need them. The defaults are generally acceptable. In order for these particular clients to connect, turn off the encryption setting for GNOME's embedded VNC server (codenamed Vino) as follows:

$ gsettings set org.gnome.Vino require-encryption false

This modification must be done on the target computer before you attempt to connect, either in person or over SSH.

Red Hat Enterprise Linux 7 remoted to RHEL 8 with TightVNC

Use the option for an SSH tunnel to ensure that your VNC connection is fully encrypted.

Screen sharing through a browser

If network re-configuration is out of the question, sharing over an online meeting or collaboration platform is yet another option. The best open source platform for this is Nextcloud , which offers screen sharing over plain old HTTPS. With no firewall exceptions and no additional encryption required, Nextcloud's Talk app provides video and audio chat, plus whole-screen sharing using WebRTC technology.

This option requires a Nextcloud installation, but given that it's the best open source groupware package out there, it's probably worth looking at if you're not already running an instance. You can install Nextcloud yourself, or you can purchase hosting from Nextcloud.

To install the Talk app, go to Nextcloud's app store. Choose the Social & Communication category and then select the Talk plugin.

Next, add a user for the target computer's owner. Have them log into Nextcloud, and then click on the Talk app in the top left of the browser window.

When you start a new chat with your user, they'll be prompted by their browser to allow notifications from Nextcloud. Whether they accept or decline, Nextcloud's interface alerts them of the incoming call in the notification area at the top right corner.

Once you're in the call with your remote user, have them click on the Share screen button at the bottom of their chat window.

Remote screens

Screen sharing can be an easy method of support as long as you plan ahead so your network and clients support it from trusted sources. Integrate VNC into your support plan early, and use screen sharing to help your users get better at what they do. Topics: Linux Seth Kenlon Seth Kenlon is a free culture advocate and UNIX geek.

OUR BEST CONTENT, DELIVERED TO YOUR INBOX

https://www.redhat.com/sysadmin/eloqua-embedded-subscribe.html?offer_id=701f20000012gE7AAI The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat.

Red Hat and the Red Hat logo are trademarks of Red Hat, Inc., registered in the United States and other countries.

Copyright ©2019 Red Hat, Inc.

https://redhat.demdex.net/dest5.html?d_nsid=0#https%3A%2F%2Fwww.redhat.com%2Fsysadmin%2Faccessing-remote-desktops

[May 24, 2019] Deal with longstanding issues like government favoritism toward local companies

May 24, 2019 | theregister.co.uk

How is it that that can be a point of contention ? Name me one country in this world that doesn't favor local companies.

These people company representatives who are complaining about local favoritism would be howling like wolves if Huawei was given favor in the US over any one of them.

I'm not saying that there are no reasons to be unhappy about business with China, but that is not one of them. 6 0 Reply


A.P. Veening , 1 day

Re: "deal with longstanding issues like government favoritism toward local companies"

Name me one country in this world that doesn't favor local companies.

I'll give you two: Liechtenstein and Vatican City, though admittedly neither has a lot of local companies.

STOP_FORTH , 1 day
Re: "deal with longstanding issues like government favoritism toward local companies"

Doesn't Liechtenstein make most of the dentures in the EU. Try taking a bite out of that market.

Kabukiwookie , 1 day
Re: "deal with longstanding issues like government favoritism toward local companies"

How can you leave Andorra out of that list?

A.P. Veening , 14 hrs
Re: "deal with longstanding issues like government favoritism toward local companies"

While you are at it, how can you leave Monaco and San Marino out of that list?

[May 24, 2019] Huawei equipment can't be trusted? As distinct from Cisco which we already have backdoored :]

May 24, 2019 | theregister.co.uk

" The Trump administration, backed by US cyber defense experts, believes that Huawei equipment can't be trusted " .. as distinct from Cisco which we already have backdoored :]

Sir Runcible Spoon
Re: Huawei equipment can't be trusted?

Didn't someone once say "I don't trust anyone who can't be bribed"?

Not sure why that popped into my head.

[May 24, 2019] The USA isn't annoyed at Huawei spying, they are annoyed that Huawei isn't spying for them

May 24, 2019 | theregister.co.uk

Pick your poison

The USA isn't annoyed at Huawei spying, they are annoyed that Huawei isn't spying for them . If you don't use Huawei who would you use instead? Cisco? Yes, just open up and let the NSA ream your ports. Oooo, filthy.

If you don't know the chip design, can't verify the construction, don't know the code and can't verify the deployment to the hardware; you are already owned.

The only question is, but which state actor; China, USA, Israel, UK.....? Anonymous Coward

[May 24, 2019] This is going to get ugly

May 24, 2019 | theregister.co.uk

..and we're all going to be poorer for it. Americans, Chinese and bystanders.

I was recently watching the WW1 channel on youtube (awesome thing, go Indy and team!) - the delusion, lack of situational understanding and short sightedness underscoring the actions of the main actors that started the Great War can certainly be paralleled to the situation here.

The very idea that you can manage to send China 40 years back in time with no harm on your side is bonkers.

[May 24, 2019] Networks are usually highly segmented and protected via firewalls and proxy. so access to routers from Internet is impossible

You can put backdoor in the router. The problem is that you will never be able to access it. also for improtant deployment countires inpect the source code of firmware. USA is playing dirty games here., no matter whether Chinese are right or wrong.
May 24, 2019 | theregister.co.uk
Re: Technological silos

They're not necessarily silos. If you design a network as a flat space with all interactions peer to peer then you have set yourself the problem of ensuring all nodes on that network are secure and enforcing traffic rules equally on each node. This is impractical -- its not that if couldn't be done but its a huge waste of resources. A more practical strategy is to layer the network, providing choke points where traffic can be monitored and managed. We currently do this with firewalls and demilitarized zones, the goal being normally to prevent unwanted traffic coming in (although it can be used to monitor and control traffic going out). This has nothing to do with incompatible standards.

I'm not sure about the rest of the FUD in this article. Yes, its all very complicated. But just as we have to know how to layer our networks we also know how to manage our information. For example, anyone who as a smartphone that they co-mingle sensitive data and public access on, relying on the integrity of its software to keep everything separate, is just plain asking for trouble. Quite apart from the risk of data leakage between applications its a portable device that can get lost, stolen or confiscated (and duplicated.....). Use common sense. Manage your data.

[May 24, 2019] Internet and phones aren't the issue. Its the chips

Notable quotes:
"... The real issue is the semiconductors - the actual silicon. ..."
"... China has some fabs now, but far too few to handle even just their internal demand - and tech export restrictions have long kept their leading edge capabilities significantly behind the cutting edge. ..."
"... On the flip side: Foxconn, Huawei et al are so ubiquitous in the electronics global supply chain that US retail tech companies - specifically Apple - are going to be severely affected, or at least extremely vulnerable to being pushed forward as a hostage. ..."
May 24, 2019 | theregister.co.uk

Duncan Macdonald

Internet, phones, Android aren't the issue - except if the US is able to push China out of GSM/ITU.

The real issue is the semiconductors - the actual silicon.

The majority of raw silicon wafers as well as the finished chips are created in the US or its most aligned allies: Japan, Taiwan. The dominant manufacturers of semiconductor equipment are also largely US with some Japanese and EU suppliers.

If Fabs can't sell to China, regardless of who actually paid to manufacture the chips, because Applied Materials has been banned from any business related to China, this is pretty severe for 5-10 years until the Chinese can ramp up their capacity.

China has some fabs now, but far too few to handle even just their internal demand - and tech export restrictions have long kept their leading edge capabilities significantly behind the cutting edge.

On the flip side: Foxconn, Huawei et al are so ubiquitous in the electronics global supply chain that US retail tech companies - specifically Apple - are going to be severely affected, or at least extremely vulnerable to being pushed forward as a hostage.

Interesting times...

[May 24, 2019] We shared and the Americans shafted us. And now *they* are bleating about people not respecting Intellectual Property Rights?

Notable quotes:
"... The British aerospace sector (not to be confused with the company of a similar name but more Capital Letters) developed, amongst other things, the all-flying tailplane, successful jet-powered VTOL flight, noise-and drag-reducing rotor blades and the no-tailrotor systems and were promised all sorts of crunchy goodness if we shared it with our wonderful friends across the Atlantic. ..."
"... We shared and the Americans shafted us. Again. And again. And now *they* are bleating about people not respecting Intellectual Property Rights? ..."
May 24, 2019 | theregister.co.uk

Anonymous Coward

Sic semper tyrannis

"Without saying so publicly, they're glad there's finally some effort to deal with longstanding issues like government favoritism toward local companies, intellectual property theft, and forced technology transfers."

The British aerospace sector (not to be confused with the company of a similar name but more Capital Letters) developed, amongst other things, the all-flying tailplane, successful jet-powered VTOL flight, noise-and drag-reducing rotor blades and the no-tailrotor systems and were promised all sorts of crunchy goodness if we shared it with our wonderful friends across the Atlantic.

We shared and the Americans shafted us. Again. And again. And now *they* are bleating about people not respecting Intellectual Property Rights?

And as for moaning about backdoors in Chinese kit, who do Cisco et al report to again? Oh yeah, those nice Three Letter Acronym people loitering in Washington and Langley...

[May 24, 2019] Oh dear. Secret Huawei enterprise router snoop 'backdoor' was Telnet service, sighs Vodafone The Register

May 24, 2019 | theregister.co.uk

A claimed deliberate spying "backdoor" in Huawei routers used in the core of Vodafone Italy's 3G network was, in fact, a Telnet -based remote debug interface.

The Bloomberg financial newswire reported this morning that Vodafone had found "vulnerabilities going back years with equipment supplied by Shenzhen-based Huawei for the carrier's Italian business".

"Europe's biggest phone company identified hidden backdoors in the software that could have given Huawei unauthorized access to the carrier's fixed-line network in Italy," wailed the newswire.

Unfortunately for Bloomberg, Vodafone had a far less alarming explanation for the deliberate secret "backdoor" – a run-of-the-mill LAN-facing diagnostic service, albeit a hardcoded undocumented one.

"The 'backdoor' that Bloomberg refers to is Telnet, which is a protocol that is commonly used by many vendors in the industry for performing diagnostic functions. It would not have been accessible from the internet," said the telco in a statement to The Register , adding: "Bloomberg is incorrect in saying that this 'could have given Huawei unauthorized access to the carrier's fixed-line network in Italy'.

"This was nothing more than a failure to remove a diagnostic function after development."

It added the Telnet service was found during an audit, which means it can't have been that secret or hidden: "The issues were identified by independent security testing, initiated by Vodafone as part of our routine security measures, and fixed at the time by Huawei."

Huawei itself told us: "We were made aware of historical vulnerabilities in 2011 and 2012 and they were addressed at the time. Software vulnerabilities are an industry-wide challenge. Like every ICT vendor we have a well-established public notification and patching process, and when a vulnerability is identified we work closely with our partners to take the appropriate corrective action."

Prior to removing the Telnet server, Huawei was said to have insisted in 2011 on using the diagnostic service to configure and test the network devices. Bloomberg reported, citing a leaked internal memo from then-Vodafone CISO Bryan Littlefair, that the Chinese manufacturer thus refused to completely disable the service at first:

Vodafone said Huawei then refused to fully remove the backdoor, citing a manufacturing requirement. Huawei said it needed the Telnet service to configure device information and conduct tests including on Wi-Fi, and offered to disable the service after taking those steps, according to the document.

El Reg understands that while Huawei indeed resisted removing the Telnet functionality from the affected items – broadband network gateways in the core of Vodafone Italy's 3G network – this was done to the satisfaction of all involved parties by the end of 2011, with another network-level product de-Telnet-ised in 2012.

Broadband network gateways in 3G UMTS mobile networks are described in technical detail in this Cisco (sorry) PDF . The devices are also known as Broadband Remote Access Servers and sit at the edge of a network operator's core.

The issue is separate from Huawei's failure to fully patch consumer-grade routers , as exclusively revealed by The Register in March.

Plenty of other things (cough, cough, Cisco) to panic about

Characterising this sort of Telnet service as a covert backdoor for government spies is a bit like describing your catflap as an access portal that allows multiple species to pass unhindered through a critical home security layer. In other words, massively over-egging the pudding.

Many Reg readers won't need it explaining, but Telnet is a routinely used method of connecting to remote devices for management purposes. When deployed with appropriate security and authentication controls in place, it can be very useful. In Huawei's case, the Telnet service wasn't facing the public internet, and was used to set up and test devices.

Look, it's not great that this was hardcoded into the equipment and undocumented – it was, after all, declared a security risk – and had to be removed after some pressure. However, it's not quite the hidden deliberate espionage backdoor for Beijing that some fear.

Twitter-enabled infoseccer Kevin Beaumont also shared his thoughts on the story, highlighting the number of vulns in equipment from Huawei competitor Cisco, a US firm:

me title=

For example, a pretty bad remote access hole was discovered in some Cisco gear , which the mainstream press didn't seem too fussed about. Ditto hardcoded root logins in Cisco video surveillance boxes. Lots of things unfortunately ship with insecure remote access that ought to be removed; it's not evidence of a secret backdoor for state spies.

Given Bloomberg's previous history of trying to break tech news, when it claimed that tiny spy chips were being secretly planted on Supermicro server motherboards – something that left the rest of the tech world scratching its collective head once the initial dust had settled – it may be best to take this latest revelation with a pinch of salt. Telnet wasn't even mentioned in the latest report from the UK's Huawei Cyber Security Evaluation Centre, which savaged Huawei's pisspoor software development practices.

While there is ample evidence in the public domain that Huawei is doing badly on the basics of secure software development, so far there has been little that tends to show it deliberately implements hidden espionage backdoors. Rhetoric from the US alleging Huawei is a threat to national security seems to be having the opposite effect around the world.

With Bloomberg, an American company, characterising Vodafone's use of Huawei equipment as "defiance" showing "that countries across Europe are willing to risk rankling the US in the name of 5G preparedness," it appears that the US-Euro-China divide on 5G technology suppliers isn't closing up any time soon. ®

Bootnote

This isn't shaping up to be a good week for Bloomberg. Only yesterday High Court judge Mr Justice Nicklin ordered the company to pay up £25k for the way it reported a live and ongoing criminal investigation.

[May 17, 2019] Shareholder Capitalism, the Military, and the Beginning of the End for Boeing

Highly recommended!
Notable quotes:
"... Like many of its Wall Street counterparts, Boeing also used complexity as a mechanism to obfuscate and conceal activity that is incompetent, nefarious and/or harmful to not only the corporation itself but to society as a whole (instead of complexity being a benign byproduct of a move up the technology curve). ..."
"... The economists who built on Friedman's work, along with increasingly aggressive institutional investors, devised solutions to ensure the primacy of enhancing shareholder value, via the advocacy of hostile takeovers, the promotion of massive stock buybacks or repurchases (which increased the stock value), higher dividend payouts and, most importantly, the introduction of stock-based pay for top executives in order to align their interests to those of the shareholders. These ideas were influenced by the idea that corporate efficiency and profitability were impinged upon by archaic regulation and unionization, which, according to the theory, precluded the ability to compete globally. ..."
"... "Return on Net Assets" (RONA) forms a key part of the shareholder capitalism doctrine. ..."
"... If the choice is between putting a million bucks into new factory machinery or returning it to shareholders, say, via dividend payments, the latter is the optimal way to go because in theory it means higher net returns accruing to the shareholders (as the "owners" of the company), implicitly assuming that they can make better use of that money than the company itself can. ..."
"... It is an absurd conceit to believe that a dilettante portfolio manager is in a better position than an aviation engineer to gauge whether corporate investment in fixed assets will generate productivity gains well north of the expected return for the cash distributed to the shareholders. But such is the perverse fantasy embedded in the myth of shareholder capitalism ..."
"... When real engineering clashes with financial engineering, the damage takes the form of a geographically disparate and demoralized workforce: The factory-floor denominator goes down. Workers' wages are depressed, testing and quality assurance are curtailed. ..."
May 17, 2019 | www.nakedcapitalism.com

The fall of the Berlin Wall and the corresponding end of the Soviet Empire gave the fullest impetus imaginable to the forces of globalized capitalism, and correspondingly unfettered access to the world's cheapest labor. What was not to like about that? It afforded multinational corporations vastly expanded opportunities to fatten their profit margins and increase the bottom line with seemingly no risk posed to their business model.

Or so it appeared. In 2000, aerospace engineer L.J. Hart-Smith's remarkable paper, sardonically titled "Out-Sourced Profits – The Cornerstone of Successful Subcontracting," laid out the case against several business practices of Hart-Smith's previous employer, McDonnell Douglas, which had incautiously ridden the wave of outsourcing when it merged with the author's new employer, Boeing. Hart-Smith's intention in telling his story was a cautionary one for the newly combined Boeing, lest it follow its then recent acquisition down the same disastrous path.

Of the manifold points and issues identified by Hart-Smith, there is one that stands out as the most compelling in terms of understanding the current crisis enveloping Boeing: The embrace of the metric "Return on Net Assets" (RONA). When combined with the relentless pursuit of cost reduction (via offshoring), RONA taken to the extreme can undermine overall safety standards.

Related to this problem is the intentional and unnecessary use of complexity as an instrument of propaganda. Like many of its Wall Street counterparts, Boeing also used complexity as a mechanism to obfuscate and conceal activity that is incompetent, nefarious and/or harmful to not only the corporation itself but to society as a whole (instead of complexity being a benign byproduct of a move up the technology curve).

All of these pernicious concepts are branches of the same poisoned tree: " shareholder capitalism ":

[A] notion best epitomized by Milton Friedman that the only social responsibility of a corporation is to increase its profits, laying the groundwork for the idea that shareholders, being the owners and the main risk-bearing participants, ought therefore to receive the biggest rewards. Profits therefore should be generated first and foremost with a view toward maximizing the interests of shareholders, not the executives or managers who (according to the theory) were spending too much of their time, and the shareholders' money, worrying about employees, customers, and the community at large. The economists who built on Friedman's work, along with increasingly aggressive institutional investors, devised solutions to ensure the primacy of enhancing shareholder value, via the advocacy of hostile takeovers, the promotion of massive stock buybacks or repurchases (which increased the stock value), higher dividend payouts and, most importantly, the introduction of stock-based pay for top executives in order to align their interests to those of the shareholders. These ideas were influenced by the idea that corporate efficiency and profitability were impinged upon by archaic regulation and unionization, which, according to the theory, precluded the ability to compete globally.

"Return on Net Assets" (RONA) forms a key part of the shareholder capitalism doctrine. In essence, it means maximizing the returns of those dollars deployed in the operation of the business. Applied to a corporation, it comes down to this: If the choice is between putting a million bucks into new factory machinery or returning it to shareholders, say, via dividend payments, the latter is the optimal way to go because in theory it means higher net returns accruing to the shareholders (as the "owners" of the company), implicitly assuming that they can make better use of that money than the company itself can.

It is an absurd conceit to believe that a dilettante portfolio manager is in a better position than an aviation engineer to gauge whether corporate investment in fixed assets will generate productivity gains well north of the expected return for the cash distributed to the shareholders. But such is the perverse fantasy embedded in the myth of shareholder capitalism.

Engineering reality, however, is far more complicated than what is outlined in university MBA textbooks. For corporations like McDonnell Douglas, for example, RONA was used not as a way to prioritize new investment in the corporation but rather to justify disinvestment in the corporation. This disinvestment ultimately degraded the company's underlying profitability and the quality of its planes (which is one of the reasons the Pentagon helped to broker the merger with Boeing; in another perverse echo of the 2008 financial disaster, it was a politically engineered bailout).

RONA in Practice

When real engineering clashes with financial engineering, the damage takes the form of a geographically disparate and demoralized workforce: The factory-floor denominator goes down. Workers' wages are depressed, testing and quality assurance are curtailed. Productivity is diminished, even as labor-saving technologies are introduced. Precision machinery is sold off and replaced by inferior, but cheaper, machines. Engineering quality deteriorates. And the upshot is that a reliable plane like Boeing's 737, which had been a tried and true money-spinner with an impressive safety record since 1967, becomes a high-tech death trap.

The drive toward efficiency is translated into a drive to do more with less. Get more out of workers while paying them less. Make more parts with fewer machines. Outsourcing is viewed as a way to release capital by transferring investment from skilled domestic human capital to offshore entities not imbued with the same talents, corporate culture and dedication to quality. The benefits to the bottom line are temporary; the long-term pathologies become embedded as the company's market share begins to shrink, as the airlines search for less shoddy alternatives.

You must do one more thing if you are a Boeing director: you must erect barriers to bad news, because there is nothing that bursts a magic bubble faster than reality, particularly if it's bad reality.

The illusion that Boeing sought to perpetuate was that it continued to produce the same thing it had produced for decades: namely, a safe, reliable, quality airplane. But it was doing so with a production apparatus that was stripped, for cost reasons, of many of the means necessary to make good aircraft. So while the wine still came in a bottle signifying Premier Cru quality, and still carried the same price, someone had poured out the contents and replaced them with cheap plonk.

And that has become remarkably easy to do in aviation. Because Boeing is no longer subject to proper independent regulatory scrutiny. This is what happens when you're allowed to " self-certify" your own airplane , as the Washington Post described: "One Boeing engineer would conduct a test of a particular system on the Max 8, while another Boeing engineer would act as the FAA's representative, signing on behalf of the U.S. government that the technology complied with federal safety regulations."

This is a recipe for disaster. Boeing relentlessly cut costs, it outsourced across the globe to workforces that knew nothing about aviation or aviation's safety culture. It sent things everywhere on one criteria and one criteria only: lower the denominator. Make it the same, but cheaper. And then self-certify the plane, so that nobody, including the FAA, was ever the wiser.

Boeing also greased the wheels in Washington to ensure the continuation of this convenient state of regulatory affairs for the company. According to OpenSecrets.org , Boeing and its affiliates spent $15,120,000 in lobbying expenses in 2018, after spending, $16,740,000 in 2017 (along with a further $4,551,078 in 2018 political contributions, which placed the company 82nd out of a total of 19,087 contributors). Looking back at these figures over the past four elections (congressional and presidential) since 2012, these numbers represent fairly typical spending sums for the company.

But clever financial engineering, extensive political lobbying and self-certification can't perpetually hold back the effects of shoddy engineering. One of the sad byproducts of the FAA's acquiescence to "self-certification" is how many things fall through the cracks so easily.

[May 03, 2019] How can we regularly update a disconnected system (A system without internet connection)

Aug 10, 2017 | access.redhat.com
Environment Issue Resolution

Depending on the environment and circumstances, there are different approaches for updating an offline system.

Approach 1: Red Hat Satellite

For this approach a Red Hat Satellite server is deployed. The Satellite receives the latest packages from Red Hat repositories. Client systems connect to the Satellite and install updates. More details on Red Hat Satellite are available here: https://www.redhat.com/red_hat_network/ . Please also refer to the document Update a Disconnected Red Hat Network Satellite .


Approach 2: Download the updates on a connected system

If a second, similar system exists

then the second system can download applicable errata packages. After downloading the errata packages can be applied to other systems. More documentation: How to update offline RHEL server without network connection to Red Hat Network/Proxy/Satellite? .


Approach 3: Update with new minor release media

DVD media of new RHEL minor releases (i.e. RHEL6.1) are available from RHN. These media images can directly on the system be used for updating, or offered i.e. via http and be used from other systems as a yum repository for updating. For more details refer to:


Approach 4: Manually downloading and installing or updating packages

It is possible to download and install errata packages. For details refer to this document: How do I download security RPMs using the Red Hat Errata Website? .


Approach 5: Create a Local Repository

This approach is applicable to RHEL 5/6/7. With a registered server that is connected to Red Hat repositories, and is the same Major version. The connected system can use reposync to download all the rpms from a specified repository into a local directory. Then using http,nfs,ftp,or targeting a local directory (file://) this can be configured as a repository which yum can use to install packages and resolve dependencies.

How to create a local mirror of the latest update for Red Hat Enterprise Linux 5, 6, 7 without using Satellite server?

Checking the security erratas :-

To check the security erratas on the system that is not connected to the internet, download the copy the updateinfo.xml.gz file from the identical registered system. The detailed steps can be checked in How to update security Erratas on system which is not connected to internet ? knowledgebase.

Root Cause

Without a connection to the RHN/RHSM the updates have to be transferred over other paths. These are partly hard to implement and automate.

This solution is part of Red Hat's fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form. 8 Comments Log in to comment RZ Community Member 26 points


1 February 2012 3:56 PM Randy Zagar

"Approach 3: update with new minor release media" will not work with RHEL 6. Many packages (over 1500) in the "optional" channels simply are not present on any iso images. There is an open case , but the issue will not be addressed before RHEL 6 Update 4 (and possibly never).

22 May 2014 9:27 AM Umesh Susvirkar

I agree with "Randy Zager" "optional" packages should be available offline along with other channels which are not available in ISO's.

12 July 2016 7:20 PM Adrian Kennedy

Can Approach 5 "additional server, reposync fetching" be applied with RHEL 7 servers?

4 August 2016 8:55 AM Dejan Cugalj

Raw

Yes. You need to:
- subscribe server to RH 
- synchronize repositories with reposync util
- up to 40GB per major release of RHEL.
16 August 2016 10:54 PM Michael White

However, won't I need to stand up another RHEL 7 server in additional to the RHEL 6 server?

8 August 2017 7:01 PM John Castranio

Correct. When using an external server to reposync updates, you will need one system for each Major Version of RHEL that you want to sync packages from.

RHEL 7 does not have access to RHEL 6 repositories just as RHEL 6 can't access RHEL 7 repositories

15 January 2019 10:14 PM BRIAN KEYES

what I am looking for is the instructions on the reposync install AND how to update off line clients

do I have to manually install apache?

16 January 2019 10:50 PM Randy Zagar

You will need: a RH Satellite or RH Proxy server, an internal yum server, and a RHN client for each OS variant (and architecture) you intend to support "offline". E.g. supporting 6Server, 6Client, and 6Workstation for i686 and x86_64 would normally require 6 RHN clients, but only three RHN clients would be necessary for RHEL7, as there's no support for i686 architecture

Yum clients can (according to the docs) use nfs resource paths in the baseurl statement, so apache is not strictly necessary on your yum server, but most people do it that way...

Each RHN client will need: local storage to store packages downloaded via reposync (e.g. "reposync -d -g -l -n -t -p /my/storage --repoid=rhel-i686-workstation-optional-6"). You'll need to run "createrepo" on each repository that gets updated, and you'll need to create an rsync service that provides access to each clients' /my/storage volume

Your internal yum server will need a cron script to run rsync against your RHN clients so you can collect all these software channels in one spot.

You'll also need to create custom yum repo files for your client systems (e.g. redhat-6Workstation.repo) that will point to the correct repositories on your yum server.

I'd recommend you NOT run these cron scripts during normal business hours... your sys-admins will want a stable copy so they can clone things for other offline networks.

If you're clever, you can convince one RHN client system to impersonate the different OS variants, reducing the number of systems you need to deploy.

You'll also most likely want to run "hardlink" on your yum server pretty regularly as there's lots of redundant packages across each OS variant.

[May 03, 2019] How to enable/disable repository using Subscription Manager or Yum-Utils by Roshan V Sharma

Sep 29, 2017 | developers.redhat.com

This blog is to resolve the following issues/answering the following questions.

  1. How to enable a repository using the Red Hat Subscription Manager/yum?
  2. Need to access a repository using the Red Hat Subscription Manager/yum?
  3. How to disable a repository using the Red Hat Subscription Manager/yum?
  4. How to subscribe a child channel using the Red Hat Subscription Manager/yum?

To enable/disable repository using Subscription-Manger or Yum-Utils you'll need:

  1. Red Hat Enterprise Linux 6 or higher.
  2. Red Hat Subscription Management (RHSM).
  3. Red Hat Subscription Manager.
  4. System registered to RHN classic (see system register with rhn classic ).

If you are using the latest version Red Hat Enterprise Linux then it will be very useful (I suggest to update your system at regular intervals).

Develop using Red Hat's most valuable products

Your membership unlocks Red Hat products and technical training on enterprise cloud application development.

JOIN RED HAT DEVELOPER

Solution

Before that, we need to know what is a repository.

A repository is content, based on product and contents of a delivery network. The system is subscribed to products and is defined in the rhsm.conf file in your systems.

For enabling a repository, you have to be the root user.

Then check the repository list

To enable repository

change ReposName to the repository name you want.

To disable repository

change ReposName to the repository name you want.

In some university/enterprise, subscription manager is blocked for that they can use yum to enable or disable any repository.

For using yum to enable.disable repos you need to install config-manager attribute for that using yum-utils.

Before enabling repository to make sure that all repository is in a stable state.

To check enabled repository

To enable repository

change ReposName to your repository name.

To disable repository

change ReposName to your repository name.

Some installing packages use –enablerepo to install packages

And to disabling the Subscription Manager

When a system is registered using a subscription manager a file name redhat.repo is created, it is a special yum repository. Maintaining redhat.repo in some environments may not be desirable. It can create static in content management operation if that repo is not the actual one used for subscription. You can disable it by making the rshm.manage repo setting it to a value of zero (0).

Any doubts/questions please comment below.

Take advantage of your Red Hat Developers membership and download RHEL today at no cost.

[May 02, 2019] How can we regularly update a disconnected system (A system without internet connection) - Red Hat Customer Portal

May 02, 2019 | access.redhat.com

How can we regularly update a disconnected system (A system without internet connection)? Solution Verified - Updated August 10 2017 at 12:12 PM -

Environment Issue Resolution

Depending on the environment and circumstances, there are different approaches for updating an offline system.

Approach 1: Red Hat Satellite

For this approach a Red Hat Satellite server is deployed. The Satellite receives the latest packages from Red Hat repositories. Client systems connect to the Satellite and install updates. More details on Red Hat Satellite are available here: https://www.redhat.com/red_hat_network/ . Please also refer to the document Update a Disconnected Red Hat Network Satellite .


Approach 2: Download the updates on a connected system

If a second, similar system exists

then the second system can download applicable errata packages. After downloading the errata packages can be applied to other systems. More documentation: How to update offline RHEL server without network connection to Red Hat Network/Proxy/Satellite? .


Approach 3: Update with new minor release media

DVD media of new RHEL minor releases (i.e. RHEL6.1) are available from RHN. These media images can directly on the system be used for updating, or offered i.e. via http and be used from other systems as a yum repository for updating. For more details refer to:


Approach 4: Manually downloading and installing or updating packages

It is possible to download and install errata packages. For details refer to this document: How do I download security RPMs using the Red Hat Errata Website? .


Approach 5: Create a Local Repository

This approach is applicable to RHEL 5/6/7. With a registered server that is connected to Red Hat repositories, and is the same Major version. The connected system can use reposync to download all the rpms from a specified repository into a local directory. Then using http,nfs,ftp,or targeting a local directory (file://) this can be configured as a repository which yum can use to install packages and resolve dependencies.

How to create a local mirror of the latest update for Red Hat Enterprise Linux 5, 6, 7 without using Satellite server?

Checking the security erratas :-

To check the security erratas on the system that is not connected to the internet, download the copy the updateinfo.xml.gz file from the identical registered system. The detailed steps can be checked in How to update security Erratas on system which is not connected to internet ? knowledgebase.

Root Cause

Without a connection to the RHN/RHSM the updates have to be transferred over other paths. These are partly hard to implement and automate.

This solution is part of Red Hat's fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form. 8 Comments Log in to comment RZ Community Member 26 points


1 February 2012 3:56 PM Randy Zagar

"Approach 3: update with new minor release media" will not work with RHEL 6. Many packages (over 1500) in the "optional" channels simply are not present on any iso images. There is an open case , but the issue will not be addressed before RHEL 6 Update 4 (and possibly never).

22 May 2014 9:27 AM Umesh Susvirkar

I agree with "Randy Zager" "optional" packages should be available offline along with other channels which are not available in ISO's.

12 July 2016 7:20 PM Adrian Kennedy

Can Approach 5 "additional server, reposync fetching" be applied with RHEL 7 servers?

4 August 2016 8:55 AM Dejan Cugalj

Raw

Yes. You need to:
- subscribe server to RH 
- synchronize repositories with reposync util
- up to 40GB per major release of RHEL.
16 August 2016 10:54 PM Michael White

However, won't I need to stand up another RHEL 7 server in additional to the RHEL 6 server?

8 August 2017 7:01 PM John Castranio

Correct. When using an external server to reposync updates, you will need one system for each Major Version of RHEL that you want to sync packages from.

RHEL 7 does not have access to RHEL 6 repositories just as RHEL 6 can't access RHEL 7 repositories

15 January 2019 10:14 PM BRIAN KEYES

what I am looking for is the instructions on the reposync install AND how to update off line clients

do I have to manually install apache?

16 January 2019 10:50 PM Randy Zagar

You will need: a RH Satellite or RH Proxy server, an internal yum server, and a RHN client for each OS variant (and architecture) you intend to support "offline". E.g. supporting 6Server, 6Client, and 6Workstation for i686 and x86_64 would normally require 6 RHN clients, but only three RHN clients would be necessary for RHEL7, as there's no support for i686 architecture

Yum clients can (according to the docs) use nfs resource paths in the baseurl statement, so apache is not strictly necessary on your yum server, but most people do it that way...

Each RHN client will need: local storage to store packages downloaded via reposync (e.g. "reposync -d -g -l -n -t -p /my/storage --repoid=rhel-i686-workstation-optional-6"). You'll need to run "createrepo" on each repository that gets updated, and you'll need to create an rsync service that provides access to each clients' /my/storage volume

Your internal yum server will need a cron script to run rsync against your RHN clients so you can collect all these software channels in one spot.

You'll also need to create custom yum repo files for your client systems (e.g. redhat-6Workstation.repo) that will point to the correct repositories on your yum server.

I'd recommend you NOT run these cron scripts during normal business hours... your sys-admins will want a stable copy so they can clone things for other offline networks.

If you're clever, you can convince one RHN client system to impersonate the different OS variants, reducing the number of systems you need to deploy.

You'll also most likely want to run "hardlink" on your yum server pretty regularly as there's lots of redundant packages across each OS variant.

[Apr 30, 2019] The end of Scientific Linux [LWN.net]

Apr 25, 2019 | lwn.net

The letter from James Amundson, Head, Fermilab Scientific Computing Division

Scientific Linux is driven by Fermilab's scientific mission and focused
on the changing needs of experimental facilities.

Fermilab is looking ahead to DUNE[1] and other future international
collaborations. One part of this is unifying our computing platform with
collaborating labs and institutions.

Toward that end, we will deploy CentOS 8 in our scientific computing
environments rather than develop Scientific Linux 8. We will
collaborate with CERN and other labs to help make CentOS an even better
platform for high-energy physics computing.

Fermilab will continue to support Scientific Linux 6 and 7 through the
remainder of their respective lifecycles. Thank you to all who have
contributed to Scientific Linux and who continue to do so.

[1] For more information on DUNE please visit https://www.dunescience.org/

James Amundson
Head, Scientific Computing Division

Office of the CIO

[Apr 01, 2019] The Seven Computational Cluster Truths

Inspired by "The seven networking truth by R. Callon, April 1, 1996
Feb 26, 2019 | www.softpanorama.org

Adapted for HPC clusters by Nikolai Bezroukov on Feb 25, 2019

Status of this Memo
This memo provides information for the HPC community. This memo does not specify an standard of any kind, except in the sense that all standards must implicitly follow the fundamental truths. Distribution of this memo is unlimited.
Abstract
This memo documents seven fundamental truths about computational clusters.
Acknowledgements
The truths described in this memo result from extensive study over an extended period of time by many people, some of whom did not intend to contribute to this work. The editor would like to thank the HPC community for helping to refine these truths.
1. Introduction
These truths apply to HPC clusters, and are not limited to TCP/IP, GPFS, scheduler, or any particular component of HPC cluster.
2. The Fundamental Truths
(1) Some things in life can never be fully appreciated nor understood unless experienced firsthand. Most problems in a large computational clusters can never be fully understood by someone who never run a cluster with more then 16, 32 or 64 nodes.

(2) Every problem or upgrade on a large cluster always takes at least twice longer to solve than it seems like it should.

(3) One size never fits all, but complexity increases non-linearly with the size of the cluster. In some areas (storage, networking) the problem grows exponentially with the size of the cluster.
(3a) Supercluster is an attempt to try to solve multiple separate problems via a single complex solution. But its size creates another set of problem which might outweigh the set of problem it intends to solve. .

(3b) With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea.

(3c) Large, Fast, Cheap: you can't have all three.

(4) On a large cluster issues are more interconnected with each other and a typical failure often affects larger number of nodes or components and take more effort to resolve
(4a) Superclusters proves that it is always possible to add another level of complexity into each cluster layer, especially at networking layer until only applications that use a single node run well.

(4b) On a supercluster it is easier to move a networking problem around, than it is to solve it.

(4c)You never understand how bad and buggy is your favorite scheduler is until you deploy it on a supercluster.

(4d) If the solution that was put in place for the particular cluster does not work, it will always be proposed later for new cluster under a different name...

(5) Functioning of a large computational cluster is undistinguishable from magic.
(5a) User superstition that "the more cores, the better" is incurable, but the user desire to run their mostly useless models on as many cores as possible can and should be resisted.

(5b) If you do not know what to do with the problem on the supercluster you can always "wave a dead chicken" e.g. perform a ritual operation on crashed software or hardware that most probably will be futile but is nevertheless useful to satisfy "important others" and frustrated users that an appropriate degree of effort has been expended.

(5c) Downtime of the large computational clusters has some mysterious religious ritual quality in it in modest doze increases the respect of the users toward the HPC support team. But only to a certain limit.

(6) "The more cores the better" is a religious truth similar to the belief in Flat Earth during Middle Ages and any attempt to challenge it might lead to burning of the heretic at the stake.

(6a) The number of cores in the cluster has a religious quality and in the eyes of users and management has power almost equal to Divine Spirit. In the stage of acquisition of the hardware it outweighs all other considerations, driving towards the cluster with maximum possible number of cores within the allocated budget Attempt to resist buying for computational nodes faster CPUs with less cores are futile.

(6b) The best way to change your preferred hardware supplier is buy a large computational cluster.

(6c) Users will always routinely abuse the facility by specifying more cores than they actually need for their runs

(7) For all resources, whatever the is the size of your cluster, you always need more.

(7a) Overhead increases exponentially with the size of the cluster until all resources of the support team are consumed by the maintaining the cluster and none can be spend for helping the users.

(7b) Users will always try to run more applications and use more languages that the cluster team can meaningfully support.

(7c) The most pressure on the support team is exerted by the users with less useful for the company and/or most questionable from the scientific standpoint applications.

(7d) The level of ignorance in computer architecture of 99% of users of large computational clusters can't be overestimated.

Security Considerations

This memo raises no security issues. However, security protocols used in the HPC cluster are subject to those truths.

References

The references have been deleted in order to protect the guilty and avoid enriching the lawyers.

[Mar 06, 2019] Can not install RHEL 7 on disk with existing partitions

Notable quotes:
"... In the case of the partition tool, it is too complicated for a common user to use so don't assume that the person using it is an idiot. ..."
May 12, 2014 | access.redhat.com
Can not install RHEL 7 on disk with existing partitions Latest response May 23 2014 at 8:23 AM Can not install RHEL 7 on disk with existing partitions. Menus for old partitions are grayed out. Can not select mount points for existing partitions. Can not delete them and create new ones. Menus say I can but installer says there is a partition error. It shows me a complicated menu that warns be what is about to happen and asks me to accept the changes. It does not seem to accept "accept" as an answer. Booting to the recovery mode on the USB drive and manually deleting all partitions is not a practical solution.

You have to assume that someone who is doing manual partitioning knows what he is doing. Provide a simple tool during install so that the drive can be fully partitioned and formatted. Do not try so hard to make tools that protect use from our own mistakes. In the case of the partition tool, it is too complicated for a common user to use so don't assume that the person using it is an idiot.

The common user does not know hardware and doesn't want to know it. He expects security questions during the install and everything else should be defaults. A competent operator will backup anything before doing a new OS install. He needs the details and doesn't need an installer that tells him he can't do what he has done for 20 years.