|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
Rumors are that Microsoft Office 2012 will support PDF import. There is also add-on to Microsoft Office 2007 that allows you to export and save to the PDF and XPS formats in eight 2007 Microsoft Office programs. It also allows you to send as e-mail attachment in the PDF and XPS formats in a subset of these programs.
Nuance use to have PDF converter but it was not very good.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Linux Journal
The Sun PDF Import Extension is one of the most popular OpenOffice.org extensions ever created. For the last two years, it has been near the top of the list of most popular downloads on the OpenOffice.org Extensions site -- and no wonder, considering that it is a free replacement for Adobe Acrobat, which is currently priced at $449US. However, the extension does have some quirks and limitations that you have to learn to work around.
The first quirk you have to overcome is obtaining it. To start with, you need to be running OpenOffice.org 3.0 or higher.
That is probably not a problem for most users, but finding a usable copy of the extension may be. When you click the Get it! button on the extensions site, the link takes you to a page about Oracle Open Office, the successor to Sun Microsystems's Star Office. This page mentions the PDF Import Extension, but provides no downloads.
To download the extension, you need to be alert when your browser switches to the page that thanks you for downloading, and choose a manual download before you can get the file.
Even then, to judge from the comments on the extensions page (and my own experience), you may have trouble using the extension after you install it from Tools -> Extension Manager. The easiest way to get the extension is to check your distribution's repository to see if it is included as a package, as in Debian.
You will know if you have succeeded in installing if you try to open an PDF file and it displays in Draw.
By contrast, if you get a few characters of gibberish, you need to keep searching for another way of getting the extension. You might be able to find an alternative download site with an earlier version that you can use. Don't worry if the version number is far below the 1.01 release mentioned on the extension page; the version numbers took a huge, unwarranted leap, and (so far as I can tell) a .4x version will not be much different in functionality from the 1.01 release.
Using the extension
Once you have the Sun PDF Import Extension installed, you need to know its limitations. Unfortunately, it's a mixture of good and bad news.
The good news is that the extension works extremely well with text, preserving all types of formatting including font size, bold, italics, strike-through and underlining. Fonts, too are preserved, although their names are not always parsed correctly and may have a few additional characters at the end of them. Should the fonts not be available on your system, the extension tries to replace them with a font whose characters are metrically equivalent. The positioning, too, of text, is maintained in all-text documents, so that a brochure that has text scattered over the page is imported as accurately as a white paper that is a solid block of paragraphs.
The extension places each line of text in a separate text frame. Each fragment of a line separate by a tab or spacing is also placed in a separate text frame. This arrangement means that you can easily correct typos, or add a few words if the line is short. Add much more, and you will throw off the line spacing in the document. You can, of course, add your text frames, but you will have to work carefully not to interfere with the line spacing or the bottom margin -- to say nothing of moving every line carefully downwards. Still, the effort may be worthwhile if you need to edit or recover an important document.
Another problem is that true Adobe forms and graphics are not imported at all. At the most, you will have only their frames, and, at times, especially with PNG graphics, the positioning of text will be thrown off by the missing elements. In these cases, if you want to include the forms or graphics included in a PDF made outside of OpenOffice.org, then you will have to capture them and insert them manually into the Draw document.
If you import a PDF created within OpenOffice.org, you may be able to import forms and graphics -- providing that you set the PDF to Hybrid format when you exported the file. A Hybrid PDF combines Acrobat and Open Document formats. A PDF reader like Adobe Acrobat that cannot parse Open Document Format will simply ignore it, but, when you come to import the file into OpenOffice.org for editing, the forms and graphics will be imported along with the text. The cost of using Hybrid format is that your files will be an average of about 20% larger, but that is a relatively small price to pay for the convenience of the kludge.
Finally, when you are finished editing, remember not to save the file, but to use File -> Export to PDF instead.
Converting a PDF file into an HTML or a XML file has been made easy by a small useful utility called PDFTOHTML. PdftoHTml is a Xpdf based tool which can convert PDF files to HTML or XML format. PDFTOHTML also supports encrypted files and support for images in the PDF file by converting to PNG images files.
The extension installs as easily as any OpenOffice.org or Firefox extension. OpenOffice.org extensions cannot register file associations with the operating system (though you can set them up manually), but importing a PDF is as simple as clicking on File and then Open. The import process takes a long time compared to opening an OpenOffice.org document because of the necessary guesswork caused by the limitations of PDF.
For a test, I exported ODF_text_reference_v1_1.odt from OpenOffice.org and imported it again. When the initial screen appeared with the results, I stared at it in disbelief. It looked just like the original. The text, layout, font faces, text colors, bold, italics, underline, and picture were well preserved.
Below are the original in Writer and the imported document in Draw. Doesn't it take more than a glance to identify which is the original?
Alternative PDF import
OpenOffice.org did not pioneer PDF import-not even in the open source market. Some of the work in OpenOffice.org is done by xpdf, a PDF viewer. To import PDFs, open source alternatives include pdftohtml, Abiword, KWord, and Inkscape. There are also a host of proprietary applications.
Depending on your needs, there are other ways to import PDFs into OpenOffice.org. To import PDFs into Writer or Impress, you may be able to combine the new PDF import extension with copy and paste. If you just need to extract text, copy the text in Adobe Acrobat Reader and paste it into OpenOffice.org. This retains some formatting.
About: html2ps is a PHP equivalent of the popular Perl script by the same name that accurately converts HTML with images, complex tables (including rowspan/colspan), layers/divs, and CSS styles to Postscript and PDF. Unlike most other HTML2PS/HTML2PDF converters, it offers good CSS 2.1 support and is very tolerant to non-valid HTML. It can convert even CSS-intense sites like aol.com and msn.com.
Changes: A large number of layout engine fixes and improvements were made.
About: pisa converts HTML to PDF using the ReportLab Toolkit, the HTML5lib, and pyPdf. It supports HTML 5 and CSS 2.1 (and some of CSS 3). The main benefit of this tool that a user with Web skills like HTML and CSS is able to generate PDF templates very quickly without learning new technologies.
Changes: New features: barcode and a table of contents. Many bugfixes. Better CSS support.
About: pdf2djvu creates DjVu files from PDF files. It's able to extract: graphics, text layer, hyperlinks, document outline (bookmarks), and metadata.
pdf2htmlEX renders PDF files in HTML, utilizing modern Web technologies, aims to provide an accuracy rendering, while keeping optimized for Web display.
It is optimized for modern web browsers such as Mozilla Firefox & Google Chrome.
This program is designed for scientific papers with complicate formulas and figures, so a precise rendering is also the #1 concern. But of course general PDF files are also supported.
- Single HTML file output
- Precise rendering
- Text Selection
- Font embedding & reencoding for Web
- Proper styling (Color, Transformation...)
- Optimization for Web
- Type 3 fonts
- Non-text object (Don't worry, they will be rendered as images)
Google matched content |
Oracle PDF Import Extension Repository for Apache OpenOffice Extensions
Gernot's interactive Postscript to PDF converter
freshmeat.net Project details for pdfasm
pdfasm.pl is a simple preprocessor for PDF files. It can remove comments, include other files, automatically count stream lengths, and build the xref table. With pdfasm.pl you can write a PDF file in a plain text editor.
ScanSoft - PDF Solutions - PDF Converter 3
This is a pretty revolutionary product that integrates with MS Word and provided much need functionality at the fraction of cost of Adobe Acrobat. You need to have MS Word to install the product, as it is a plug-in, and not a standalone program. Integration is seemless: you have addition item Open PDF in the file menu.
Based on my experience it converts pretty complex documents and presentation from PDF to MS Word 2003 with rather high quality. Conversion quality is excellent (almost perfect for text in all documents that I tried). Presentation are also converted very well (those that I tried were rather simple, mainly text slides PDF presentations.
I did not test it documents with complex layout (newspaper type documents).
You can convert from MS Word format to any Supported by MS Word formats including HTML, but generated HTML is very complex.
I also noticed that long documents conversion is rather slow.
Accessibility Tools page -- adobe PDF2HTML conversion page
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March 12, 2019