Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

  Neatperl -- a simple Perl prettyprinter
 based of "fuzzy" determination of nesting level  

Nikolai Bezroukov, 2019,   Licensed under Perl Artistic license
Version 0.81 (Oct 9, 2019)

News Programming style Recommended Links Program Understanding Defensive programming neatbash -- bash beautifier Neatperl -- a simple Perl prettyprinter
Compilers Algorithms Lexical analysis Debugging Perl Prettyprinting C C++ HTML
Multitarget verifiers Multitarget highlighters Multitarget beautifiers        
Unix History Admin Horror Stories Software Engineering Language Design and Programming Quotes Humor Random Findings Etc

Pretty printer Neatperl can be called a "fuzzy" pretty-printer. If does not perform full lexical analysis (which for Perl is difficult as Perl has complex lexical structure, which is further complicated by such innovation as custom delimiters in q, qq and qr strings). Instead it relies on analysis of a limited context of each line (prefix and suffix) to "guess" the correct nesting level.  It does not perform any reorganization of the text other then re-indentation.

Neatperl also computer some basic statistics about code, such as number of lines without comments, the number of subroutines, number of block, etc. 

For a reasonable Perl style typically found in production scripts the results are quite satisfactory. Of course, it will not work for compressed or obscured code.

This is a relatively non-traditional approach as typically prettyprinter attempt to implement full lexical analysis of the language with some elements of syntax analysis, see for example my (very old) NEATPL pretty printer ( http://www.softpanorama.org/Articles/Oldies/neatpl.pdf  ) -- one of the first first program that I have written. It implements full lexical analyzer for PL/1/. Perltidy architecture is closer to this approach too.

The main advantage is that such approach allows to implement a useful pretty printer is less then 500 lines of Perl source code with around 200 lines implementing  the formatting algorithm. Such small scripts are more maintainable and have less chances to "drop dead" and became abandonware after the initial author lost interest and no longer supports the script.

Neatperl does not depends on any non-standard Perl modules and it's distribution consists just of two items: the script itself and the readme file.  This can be  an important advantage as installing Perl modules in corporate environment often is not that simple and you can run into some bureaucratic nightmare.  Also with  many modules used you always risk compatibility hell. It is sad that Perl does not have zipped format as jar files for Java which allow to package the program with dependencies as a single file.

Another important advantage is the this is a very safe approach, which normally does not introduce any errors in bash code with the exception of indented HERE lines which might be incorrectly "re-indented" based on the current nesting.  As Perl have very complex lexical structure which in not a context free grammar its parsing  represent a daunting programming task and it can never be guaranteed to be correct.  Which means that such "fuzzy" prettyprinter approach  is a safer approach for such language.

But even it can mangle some  parts of the script such as HERE strings in case of some "too inventive" delimiters used. In Perl the delimiter can be defined via single or double quotes string, Neatperl does not recognize perverted HERE string defined via  q or qq notation.

Neatperl does not  try to nest the string which start with the first position. Such strings always remains intact.

There is no free lunch, and such limited context approach means that sometimes (rarely) the nesting level can be determined incorrectly.  There also might be problem with determination of the correct end of of HERE literals or q and qq literals. Missed HERE string  that  have non zero fixed indent can be shifted left or right which might be not a good thing  (HERE stings with zero indent are safe). So fuzzy prettyprinter is best for you own scripts in which you can maintain a safe Perl style which is prettyprinter  friendly. For scripts written by other people your mileage can vary but even in this case this is a great diagnostic tools. It also greatly  helps to understand the scripts written by other people. 

To correct this situation three pseudo-comments (pragmas)  were introduced using which you can control the formatting and correct formatting errors. All pesudocomments should start at the beginning of the line. No leading spaces allowed.

 Currently Neatperl allows three types of pseudo-comments:

  1. Switching formatting off and on for the set of lines. This idea is similar to HERE documents allowing to skip portions of the script which are too difficult to format correctly. One example is a here statement with indented lines when re-indenting them to the current nesting level (which is the default action of the formatter)  is undesirable. 
  2. Correcting nesting level if it was determined incorrectly. The directive is "#%NEST" which has  several forms (more can be added if necessary ;-): 

For example, if  Neatperl did not recognize correctly the  point of closing of a particular control structure you can close it yourself with the directive

#%NEST-- 

or

#%NEST=0 

NOTES:

Also you can arbitrary increase and decrease indent with this directive

As  Neatperl maintains stack of control keywords it reorganize it also produces some useful diagnostic messages.

For most scripts Neatperl is able to determine that correct nesting level and proper indentation. Of course, to be successful, this approach requires a certain (very reasonable) layout of the script. The main requirement is that multiline control statements should start and end on a separate line. 

One liners (control statements which start and end on the same line) are acceptable

While any of us saw pretty perverted formatting style in some scripts this typically is an anomaly in production quality scripts. Most production quality scripts display very reasonable control statements layout, the one that is expected by this pretty printer.  But again that's why I called this pretty printer "fuzzy." For example, for any script compressed to eliminate whitespace this approach to pretty printing is not successful

INVOCATION

 neatperl [options] [file_to_process]
or
 neatperl -f [other_options] [file_to_process] # in this case the text will be replaced with formatted text, 
                                              # backup will be saved in the same directory
or
cat file |  neatperl -p [other_options] > formatted_text # invocation as pipe

OPTIONS

PARAMETERS

  1st -- name of the file to be formatted


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Sep 03, 2019] Uploaded to GitHub

This is still raw version (0.4, so it is still beta, but usable ). It works for all my scripts and script by other authors, that I tested

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites