Pythonizer user guide

Version 0.85 (Oct 12, 2020)

News  Python for Perl programmers

Best Python books for system administrators

Recommended Links Perl to Python functions map Execution of commands and shell scripts using subprocess module Full protocol of translation of pre_pythonizer.pl by the current version of Pythonizer

NOTE: In version 0.8  you need to specify PERL5LIB variable pointing it to the directory with modules to run the program. See installation section below


Introduction

Some organizations are now involved in converting their old Perl codebase into other scripting languages, such as Python. But a more common task is to maintain existing Perl scripts, when the person who is assigned to this task known only Python.

University graduates now typically know Python but not Perl and that creates difficulties in the old codebase maintenance.   In this case, a program that "explains" Perl constructs in Python term would be extremely useful and, sometimes, a lifesaver. Of course, Perl 5 is here to stay (please note what happened with people who were predicting the demise of Fortran ;-), and in most cases, old scripts will stay too.

The other role is to give a quick start for system administrators who want to learn Python (for example, need to support researchers who work with Python), but who currently knows only Perl -- many older school sysadmins dislike Python and for a reason ;-)  

Yes another role is provide a proof that those two languages are mostly compatible and that program from one  can be translated into another with modest amount of effort. Although such translation is not necessary a best fit,  in most cases it is close enough and needs just minor manual editing. 

Of course, complex constructs and idioms are often translated incorrectly. Several complex issues remains unresolved (implicit conversion to strings in Perl is one such issues.) As experience with Google translation of natural languages attests there  are always around 10 to 20% of sentences (depending of the subject area of the text)  that are translated incorrectly. And, probably,  2-3% that have absurd or funny translation. 

The idea here is that using "fuzzy translation" concept it is possible to create such a tool with relatively modest efforts. A tool, written with some knowledge of compiler technologies, that falls into the category of "small language compliers" with the total effort around one man-year or less. Assuming ten lines per day of debugged code for the task of complexity comparable with the writing of compilers, the estimated size should be around 3-5K lines of code (~1K line phase 1 and 2-3K line phase 2).

As of version  0.8 it looks like the initial idea was a sound one: within 5K LOC limit it is possible to create a useful utility that transcribes Perl in Python with a decent quality.

As the currently code base exceeded 4K lines, it is close to the  limit on which I can maintain this codebase as a hobby project, so some enhancements need to be abandoned or moved to the pre-pythonizer phase.  This  first of all is true about insertion of proper conversion of   types. Right now it is left to the programmer to fix those issues. 

NOTE:

Pythonizer now  creates the list of variables that need to be declared global to preserve a part of the global namespace that Perl subroutines access.

This is a needed function as Python has different rules of variables visibility -- global variables are visible in all subroutines, but to be able to modify them you need to declare them global explicitly (and this global variable should already exist and be initialized)

Installation

For testing the main program and all modules should reside in a single directory from which you can run the program. 

You need to download files or replicate the directory via git . Let's assume that you downloaded or replicated them into  /home/softpano/Pythonizer

After that you can run pythonizer from the install directory by specifying  -I /home/softpano/Pythonizer  option for Perl interpreter to point to proper directory with modules   For example:

perl -I /home/softpano/Pythonizer ~/Pythonizer/pythonizer script_to_be_translated.pl

Alternatively you can specify PERL5LIB environment variable to point to this directory

export PERL5LIB=/home/softpano/Pythonizer

If you already have this variable defined in your bashrc you can add this directory to the end after column, for example

PERL5LIB=/home/softpano/perl5/lib/:PERL5LIB=/home/softpano/Pythonizer

After that you do not need to use option -I of Perl interpreter and can invoke the program directly

~/Pythonizer/pythonizer script_to_be_tranlated.pl

In order to specify pythonizer without path you need to add the directory in which you installed it to the PATH or copy the program to one of the directories already in PATH, for example /use/local/bin if you are root or ~/bin if you are a regular user.

Version 0.8 has a simpler installer using which  you  can quickly move the main  script and module to proper places in your directory tree. It is called installer and accept two arguments:

General format of invocation:

installer path_to_pythonizer path_to_modules

The directory  with pythonizer and modules should be current.  For example:

cd ~/Pythonizer && installer /user/local/bin /user /usr/local/share/perl5

By default if run as root it assumes /usr/local/bin as the target directory for pythonizer and /usr/local/share/perl5 for modules

If run as a regular user defaults are  ~/bin and ~/Perl5/lib

Invocation

You can run Pythonizer both in Cygwin and Linux. To "pythonize" the Perl script you need to set directory with modules in -I option  of Perl interpreter, PERL5LIB, or to put modules in some directory already in @INC as discussed in the Installation section.

The simplest  invocation can look like:  

export PERL5LIB=/opt/Pythonizer && /opt/Pythonizer/pythonizer /path/to/your/program/script_to_be_translated.pl

If the program runs to the end you will get "pythonized" text in /path/to/your/program.py

It also produces protocol of translation in /tmp/Pythonizer  with size by side Perl and Python code, which allows you to analyses the protocol detect places that need to be commented out translated manually.

If  __DATA__ or __END__ are used a separate file with  the extension  .data  (/path/to/your/program.data for the example above) will be created with  the content on this section of Perl script.

Pre-pythonizer  implements the first phase of translation

Processing consists of two passes, which currently are not integrated in any away with pre_pythonizer mainly providing refactoring of Perl code to create main procedure and move all subroutines up so that they are now defined before use. 

Running this phase also reformat the Perl script in a way that slightly increases chances that the script will be translated with fewer errors. Opening and closeting curvy bracket are put on single lines to ease the job of the lexical scanner in Pythonizer.

It can be used iether as separate utility of in integrated way via -r (refactor) option in pythonizer.

pre_pythonizer [options] <file>  # currently performs is refactoring of  Perl script, 
                                 # pushing subroutines to the top and creating main sub
                                 # out of code not included  into any subroutine. 
pythonizer [options] <file>

The first pass is currently fully optional and need transformations of Perl code can be performed by other utilities. It just slightly increase probability of more correct translation of the code. the main function is to move subroutines up. With option -m (avaole only as a standalone mode) you can also create main procedure making the format of the script more close to the typical format  of Python scripts (at some cost -- this requires more changes to Perl code and can cause some errors). So if the resulting script has errors (pre-pythonizer will report them) do not do that, at least do not do this initially or if you understanding of Python is still shaky and you are learning the ropes. .

It also reformat the code so that curvy brackets were mostly on separate lines,

It can be used as a separate program, which transforms initial Perl script creating a backup with the extension .original. The main useful function in the current version is refactoring of the Perl program by pushing subroutines up as in Python subroutines needs to be declared before use.

It needs to run only once for each Perl script you want to translate to Python. Subsequence modifications of Perl script to make it more "Python-compatible" can be performed on this text instead of the original script.

Pythonizer implements actual transformation of Perl into Python

Running  pythonizer -h provides a  list of options.

Here is example of the protocol  Full protocol of translation of pre_pythonizer.pl by version 0.2 of pythonizer

To increase chances of correct translation it is recommended to run the Perl script via pre_pythonizer.pl

Parts that can't be translated during the first invocation can be commented outs and iteratively you can modify simplify Perl code and reach the stage when the Perl script is translated more or less OK. After that you can work in Python debugger and iron remaining inconsistencies and cases of incorrect translation of statements. Sometimes algorithm of a part of the program need to be adapted to Python. 

Some features:

Limitations

Options

NOTE: All options with small numeric values can be expressed by  repeating the letter, so -p 2 is equivalent to -pp. -d 3 -ddd

Currently only few user options are supported (pythonizer -h provides a  list of options): 

Options mainly for developers pilot users

The same options work for the pre_pythonizer, but usually defaults are OK.  There is no options to control refactoring of the script.

Logs

Logs are written to /tmp/Pythonizer  You can redirect this to any directory via symlink. Currently there is no option to customize the location of the log.

Structure

Pythonizer consists of the main program called, as you can guess, pythonizer and three modules  Main program currently used three modules, which should be located in any place as long as  the main program can load then from @INC array set of directories:

The total size of the codebase in version 0.8 is around 4.3K lines:

 # wc -l Perlscan.pm Pythonizer.pm  pythonizer  Softpano.pm pre_pythonizer.pl
  1123 Perlscan.pm
   474 Pythonizer.pm
  1615 pythonizer
   340 Softpano.pm
   797 pre_pythonizer.pl
  4349 total

The total size of the codebase in version 0.7 is slightly over 4K lines:

wc -l Perlscan.pm Pythonizer.pm  pythonizer  Softpano.pm pre_pythonizer.pl
  1107 Perlscan.pm
   382 Pythonizer.pm
  1515 pythonizer
   339 Softpano.pm
   825 pre_pythonizer.pl
  4168 total

The total size of the codebase in version 0.5 is slightly below 4K lines:

wc -l Perlscan.pm Pythonizer.pm  pythonizer  Softpano.pm pre_pythonizer.pl
  1051 Perlscan.pm
   317 Pythonizer.pm
  1442 pythonizer
   336 Softpano.pm
   825 pre_pythonizer.pl
  3971 total

The total size of the codebase in version 0.4 is around 3.5K lines:

wc -l Perlscan.pm Pythonizer.pm  pythonizer  Softpano.pm pre_pythonizer.pl
   866 Perlscan.pm
   292 Pythonizer.pm
  1268 pythonizer
   312 Softpano.pm
   833 pre_pythonizer.pl
  3571 total

Recommended manual transformations of Perl code

Recently Perl adopted Python stance of minimizing the number of parenthesis in standard function

The first and foremost under-parenthesized Perl script (the scripts which uses built-in functions without parenthesis, like recommended by "Modern Perl" are translated with more errors by the current version of Pythonizer than in cases when they are parenthesized.  This  topic is discussed below in more details and some examples are provided.

All problematic new-style postfix if statements and bash-style shortcut and/or statements like  

$debug && print "Something" 

statements need to be parenthesized

Of course, cases are different and a function invocation as statement or as the only part of the assignment statement is pretty safe without parenthesis. But in complex expression your mileage can vary.  In cases where Pythonizer complains, adding parenthesizes often allows the correct the translation and go forward.

Also sometime complex expressions are parsed incorrectly by Pythonizer. In such case you can factored then out and simplify in order to get correct translation. Or translate them manually as in most such cases the proper Python equivalent is different from close to "one to one"  translation that Pythonizer provides, anyways.

But this is to be expected due to limitation of the size and the complexity of this project.

Generally, Perl provide too much syntax rope and some programmers manage to hang themselves with it: the syntax variety allowed is just an overkill and stimulate adoption of "perverted" constructs which detract from, not add to the clarity of the code.  

The most typical example of this kind of "syntactic perversions" is the excessive use of postfix if constructs. I would understand using them in loop to specify exist condition like

last if (substr($line,0,3) eq 'EOF');

but not like

$x++ if $i<$limit;
or
print SQL "SELECT column 1 FROM $var" if (defined $var);

in the first case writing

if $i < $limit { $x++; }

is only two symbols longer but is much clearer and modifiable.  Pythonizer recognizes single statements with postfix if or other conditional modifiers, and instead of single statement a comma separate list can be provided making them, essentially, a block:

$i=5,$j=10 if( $next iteration);

IMHO provides way too much syntax flexibility ;-).You need to transform your such statements to prefix if statements if you want automatic translation.   Actually the result looks more readable:

if( $next iteration ){
   $i=5;$j=10
}
Similarly Pythonizer will complain in case of cascading my declarations:
my $i=my $j=1

But this probably makes some sense so it might be corrected in later versions.

There are several even more rare form of abusing Perl syntax flexibility but I think those examples are enough.

Here are recommendation from famous book The Elements of Programming Style that any decent programmer should strive to adhere:

  1. Write clearly — don't be too clever.

  2. Say what you mean, simply and directly.

  3. Use library functions whenever feasible.

  4. Avoid too many temporary variables.

  5. Write clearly — don't sacrifice clarity for efficiency.

  6. Let the machine do the dirty work.

  7. Replace repetitive expressions by calls to common functions.

  8. Parenthesize to avoid ambiguity.

  9. Choose variable names that won't be confused.

  10. Avoid unnecessary branches.

  11. If a logical expression is hard to understand, try transforming it.

  12. ... ... ...

Decision to remove opening  and closing round brackets
 in Perl built-in function opened a can of worms

Situation with under-parenthesized built-in function, that is mentions above,  is more complex and here I think Perl implementers made a blunder opening a can of worms. Many statements written in this fashion are ambiguous.

So we will have problem with the language itself. Which exists independently of the level of the upper level complexity allowed in Pythonizer and its limitations. 

For example:

It looks like with parenthesis elimination drive Perl developers  into the situation similar to Chinese situation with the elimination of sparrows.

Today I encountered the following ambiguous statement written in "de-parenthesized" "Modern Perl":

get_config(split /$s/,$_);

[download]

Where it is completely unclear if

  1. We have two arguments to get_config and one argument to split (regular expression <tt>/$s/<tt>, and split is assumed to be operating on $_)
  2. We have two arguments to split (/$s/ and $_ ) and one argument (expression) to get_config.

It is accepted by interpreter as case (2) which IMHO is correct, but still pretty arbitrary, especially if you write

get_config(split /$s/,$i)

instead.

   DB<100> sub get_config{ print scalar(@_), join(' ',@_),"\n"} 
   DB<101> $s=' '; 
   DB<102> $_=qq(abba baba); 
   DB<103> get_config(split /$s/,$_); 
   2abba baba

Unless this problem is fixed, this is an argument for sticking to earlier versions of Perl 5 and avoiding "de-parenthesized" "Modern Perl". Looks like "Parenthesis eliminators" essentially harmed the language, while trying to help. That's happens, but this is a rather sad situation.

In any case, versions after 5.26 probably will reach commercial Linuxes in 5 to 10 years time-frame so there is still a time to fix this.

Other recommended transformations of Perl code

While parenthesizing built-in function in statements which caused problems for Pythonizer is number one recommendation, there are several other.

In many case you can get more correct translation via Pythonizer by simplifying Perl code. Some transformation can be done via macro generator like M4 some via Perl itself using regular expressions.

If pre-pythonizer transformation of Perl code produces errors you need to fix them before submitting the code to pythonizer.

Pythonizer expects syntaxically correct Perl code.  

If it hangs or goes into infinite loop, then you need to comments out the offending line(s) and try again. You can use option -d 3 to find the line if it is not clear from the listing.

NOTE: To increase changes of success in translation of long scripts with translation of which you encounter problems. you can split your code into several chunks (for example main program and subs) and try to convert them chunk by chunk. Then you can merge the code and add global declarations manually.

Also the old open statement where the opening  mode is prefixed to the file is better converted to the new mode when it is a separate argument as not all modes are correctly recognized in the old format. Typically there are very few open statement in the program so that can easily be done.

Excessive syntax flexibility might helps to explains flimsy error reporting by Perl interpreter, when it accepts clearly syntaxically incorrect programs as valid. This situation is worse in the debugger then in interpreter itself.

Version 5.26.3 also refuse to function properly in some cases, probably hitting some internal bugs or corruption on internal data structures, which demonstrated itself that a certain statement is executed incorrectly. But it recovers, if you change your Perl code a little bit. I experienced one such situation with pythonizer, when it was slightly less then 900 lines. Luckily as I added code the problem disappeared. But I seriously I thought about switching to GOlang at this point. 

Debugging generated Python code

Python debugger is still inferior to Perl debugger, so working with it is both more difficult and more time consuming. 

The standard invocation is something like

python3.8 -m pdb maintest.py

You can run the script till the first error with the command 'c' and if you are lucky at this point you have some useful information what  is wrong from the diagnostic message produced by Python interpreter. If not, you need to work step by step and determine the context of the error yourself

See  Python Debugging for some additional information and tips.

Generally to find the fist error you can run  the generated code  with the command c and proceed from this point.

But the fact that code runs to this point does not mean that the code executed correctly: you need to verify that.

For some additional ideas see Perl to Python translation

Known errors

  1. Pythonizer does not like some statements and standard functions calls with omitted round brackets (especially in complex expressions)  and can mark them untranslatable, while translating them correctly if brackets are added.
  2. For long scripts to increase changes of success you can split your code into several chunks (for example main program and subs) and try to convert them chunk by chunk.
  3. In version 0.8 filehandles are not included in the list of global variables and as such are not propagated correctly of file operations on a given file handle spread into several different subroutines.
  4. State variable are now simply mapped to global namespace which can create conflicts and corrupt their value at run time.  So the usage of state variable currently needs to ne manually reviewed to detect such conflicts. They are rare but they happen.  One possible solutions to rename them using sub name as the prefix. For example in sub maxi the state variable limit can be renamed into maxi_limit

Troubleshooting

The most common error that requires troubleshooting. Typically Pythonizer choke on some line and you get on the screen the message:

Deep recursion on subroutine "main::expression" at pythonizer line 1638, <> line 72.

To troubleshoot this situation you need to run pythonizer via perl with -d option

perl -d pythoner -d 3 problematic_script.pl
The last line shown is the problematic line that you need to modify:
 === Line 71 Perl source:while ( ( my $read_count = sysread( $file, $buffer, 4096 ) ) > 0 )===


Line:   72 TokenStr: =|c((s=i(s,s,d))>d)|= @ValPy: while ( ( read_count := sysread ( file , buffer , 4096 ) ) > 0 )
Perlscan::gen_chunk(/cygdrive/f/_Scripts/Pl2py/Perlscan.pm:1157):
1157:         push(@PythonCode,$_[$i]);

Submission of tickets

Tickets can be opened on GitHub.

The ticket should be reproducible on the most recent uploaded version; non reproducible tickets related to earlier version will be ignored.  Information proved should be enough to reproduce the error:

It is recommended that you run pythonizer with the debugging option -d 3 to generate additional output relevant fragment (only relevant fragment) can be attaches as a file to the ticket.

Samples the cause internal errors also need to be attached as files to save my time. 

History

Version 0.8 uploaded

Changes since version 0.7

Possibility to generate code for Python 2.7 and option -p were removed.

Option -w added which allows to specify the width of the line in the protocol. The default is 188 -- suitable for Fixsys 11 points font on 24" screen.

More correct translation of array assignments. Some non-obvious bugs in translation were fixed. Now you need to specify PERL5LIB variable pointing it to the directory with modules to run the program. Global variable now are initialized after main sub to undef value to create a global namespace. Previously this was done incorrectly.  Simple installer for Python programmers who do not know much Perl added: the problem proved to be useful as a help for understanding Perl scripts by Python programmers.

Significantly more cases of using built-in functions without parenthesis are translated correctly

Changes in pre_pythonizer.pl

Unlike previous versions,  the current version by default does not create main subroutine out of statement found on nesting level zero, as it introduces some errors. You need specify option -m to create it.

NOTE: All Python statements on nesting level zero should starts from the beginning of the line which is ugly, but you can enclose them in the dummy if statement

if True: 

to create the artificial nesting level 1

Version 0.7 uploaded

Changes since version 0.6.

This version creates of the list of global variables for each subroutine to maintain the same visibility in Python as in Perl and generates global statement with the list of such  variables that is inserted in each Python subroutine definition if pythonizer determined that this subroutine access global variables. The list might be excessive.

Version 0.6 uploaded

Changes since version 0.5.

Regular expressions now are translated more  correctly. Short cut if like (debug>0) && say $line are translated in more general way then before.  This is the first version that translates the main test (pre_pythonizer.pl) without syntax errors.   Generated source  starts executing in Python interpreter till the first error. List on internal functions created. Translation of backquotes and open statement improved.

Version 0.5 uploaded

Changes since version 0.4

Regular expression and tr function translation was improved. Many other changes and error corrections. -r (refactor) option implemented to allow refactoring Perl source via pre-pythonlizer.pl in integrated fashion.

Version 0.4 uploaded

Changes  since version 0.3

Version 0.3 uploaded

Changes since version 0.2:


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Oct 05, 2020] Version 0.8 uploaded

Changes since version 0.7

Possibility to generate code for Python 2.7 and option -p were removed.

Option -w added which allows to specify the width of the line in the protocol. The default is 188 -- suitable for Fixsys 11 points font on 24" screen.

More correct translation of array assignments. Some non-obvious bugs in translation were fixed. Now you need to specify PERL5LIB variable pointing it to the directory with modules to run the program. Global variable now are initialized after main sub to undef value to create a global namespace. Previously this was done incorrectly. Simple installer for Python programmers who do not know much Perl added: the problem proved to be useful as a help for understanding Perl scripts by Python programmers.

Significantly more cases of using built-in functions without parenthesis are translated correctly

Changes in pre_pythonizer.pl

Current version by default does not create main subroutine out of statement found on nesting level zero, as it introduces some errors. You need specify option -m to create it.

NOTE: All Python statements on nesting level zero should starts from the beginning of the line which is ugly, but you can enclose them in the dummy if statement

if True: 

to create an artificial nesting level 1

[Sep 18, 2020] Version 0.7 uploaded

Changes since version 0.6

This version creates of the list of global variables for each subroutine to maintain the same visibility in Python as in Perl and generates global statement with the list of such variables that is inserted in each Python subroutine definition if pythonizer determined that this subroutine access global variables.

So far the specifics of Perl state variable is ignored and they are assumed to be yet another type of global variables (they generally do not belong to the global namespace as while they have lifetime similar to global variables their namespace is local).

[Sep 08, 2020] Version 0.6 uploaded

Regular expressions now are translated more correctly. Short cut if like (debug>0) && say $line are translated in more general way then before. This is the first version that translates the main test (pre_pythonizer.pl) without syntax errors. Generated source starts executing in Python interpreter till the first error. List on internal functions created. Translation of backquotes and open statement improved.

[Aug 31, 2020] Version 0.5 uploaded

Changes since version 0.4

[Aug 22, 2020] Version 0.4 uploaded

Changes since version 0.3

[Aug 17, 2020] Version 0.3 was uploaded

Changes since version 0.2:

Recommended Links

Top articles

Sites