.docx to .pdf on Linux

How to convert a folder full of .docx files to pdf on Linux with a python script? - the struggle is real. The solution on windows is simple while using python, thanks to the handy library.

Unfortunately, the library requires an MS Office installation on the system. So what do we do? Use an online API? no way!
Libre office comes to our rescue with its handy little helper scripts on bash.
Although Libre office doesnot strictly work well with .docx files, I have personally found the conversions to be trouble free. All that is needed now, is a command.

The general syntax is this:

soffice --headless --convert-to <TargetFileExtension>:<NameOfFilter> file_to_convert.xxx
  1. --headless - Starts in “headless mode” which allows using the application without GUI.(Documentation)
  2. <TargetFileExtension> - This will be pdf in our case.
  3. <name of filter> - Filter names are found here and depend on the file you are trying to convert to. A quick Ctrl+Fand search on the site will return the result “calc_pdf_Export”.

Here is an example:

Just open your terminal and enter the following:

soffice --headless --convert-to pdf:"calc_pdf_Export" name_of_doc_file.docx

This is for one file.

While automating on python (Jupyter notebook), implementing this is as easy as inserting an exclamation before the command and it runs seamlessly with the script.

!soffice --headless --convert-to pdf:"calc_pdf_Export" name_of_doc_file.docx

(This is because Jupyter Notebook executes all commands with an exclamation (!) from the underlying OS)

Note that although .docx-.pdf is demonstrated here, soffice is extremely powerful and supports many formats including doc–>txt, xlsx–>xlc and many others. Read documentation for specific applications.

Did I miss something? Improvements? Suggestions?
Feel free to reach me on my E-mail: tellisstephen(at)gmail(dot)com

*****
"A chance not taken is an opportunity missed."