Edit LibreOffice format file internals

Any other tech-related topics
Post Reply
Message
Author
User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Edit LibreOffice format file internals

#1 Post by webfork »

EDIT: The idea around LibreOffice file contents is probably better solved (or more immediately useful) with the process detailed in a post later in the thread. (Jump to this post).

===

This is probably described elsewhere in better detail, but one of the things I've been intensely frustrated with Word/Excel/PowerPoint is how you cannot open up their supposed "Open XML" format and get anything remotely readable. It's not a clear format:


In searching for alternatives (outside of HTML) LibreOffice seemed like the best route but I couldn't seem to get inside a LibreOffice ODT file and make changes either, so I did some digging. It turns out, if you want to modify Open Doc XML files, it's easy to do.

Why would you modify LibreOffice internal content?

You can do a variety of batch operations to change formatting, content, and other complex operations via things like regular expressions find/replace. Even if you don't work in that file format, a lot of edits and changes are very easy to export to Word, Google Docs, HTML and a dozen other formats.

Steps:
  1. Copy your ODS (spreadsheet) file to a separate file called OUTPUT.ODS.ZIP
  2. Using a compression tool like 7zip, open the OUTPUT.ODS.ZIP and delete everything except for the file "mimetype"
  3. Extract the original ODS file to a folder and make whatever changes are needed to the various files inside (batch changes, , etc.)
  4. Using 7zip, copy contents of that folder back into the OUTPUT.ODS.ZIP
  5. Rename OUTPUT.ODS and launch

If that seems tedious, it's because the mimetype file needs to remain uncompressed, and is there to help it make it easier to detect: https://en.wikipedia.org/wiki/OpenDocum ... cification

And of course a CLI method is available to enable batch operations:
https://ask.libreoffice.org/en/question ... ment-file/
Last edited by webfork on Thu Oct 01, 2020 9:43 pm, edited 1 time in total.

User avatar
Midas
Posts: 6705
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: Edit LibreOffice format file internals

#2 Post by Midas »

webfork wrote:One of the things I've been intensely frustrated with Word/Excel/PowerPoint is how you cannot open up their supposed "Open XML" format and get anything remotely readable. It's not a clear format...

Flashback to the OOXML (nothing to do with OpenOffice, mind you) vs ODF crookery:

An Ars Technica article sources Groklaw stating that at Portugal's national body TC meeting, "representatives from Microsoft attempted to argue that Sun Microsystems, the creators and supporters of the competing OpenDocument format (ODF), could not be given a seat at the conference table because there was a lack of chairs.


E.g., see also:

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: Edit LibreOffice format file internals

#3 Post by webfork »

... could not be given a seat at the conference table because there was a lack of chairs.
I read recently that people don't buy software, they buy standards. That puts into perspective a lot of moves over the years I've seen to promote formats with low technical value, and do so in a shady way.

From this link: www.zdnet.com/article/microsoft-why-the ... e-matters/
Our end goal is interoperability, and we are working with our partners to develop solutions for any incompatibilities that customers care about ... As far as we know, the Translator does translate, open and save back and forth between ODF and Open XML
That makes me laugh. If you're sending someone an ODT file, convert it to DOCX in Word before sending. Microsoft Word's terrible importer is still junk 13 years later (the article was from 2007). And I enjoy every opportunity to make Microsoft productivity tools the *last* step in my workflow.

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: Edit LibreOffice format file internals

#4 Post by webfork »

Months ago I started this thread talking about how to modify LO file internals, but today I found a generally better method to view/modify the format: a pure XML option or "Flat OpenDocument Text Document" (FOTD, or FODS / FODP for spreadsheets, presentations respectively). The contents are referred to as "flat" because they don't contain subfolders or separate files -- it's all one long text document.

I couldn't find much background on the topic, so I decided to do a quick write-up.

---

Steps
  1. Create a file save a file in "flat" format:

    View of the Save As dialog (Windows 10)
    Image
  2. Open it inside an text editor (ideally one that highlights XML):

    Notepad++ showing the FODT file
    Image
  3. Make changes, save, and re-open in LibreOffice. It's also possible to change things inside the doc and then (in the case of Noteapad++, it will prompt you to refresh to show changes.)

Advantages
  • Make changes to formatting in a text editor rather than opening LibreOffice, useful as LO's has many viewers on mobile platforms, but few editors.
  • There are some some tools that can search inside DOCX and ODT files, but there are hundreds that can search pure text files.
  • You can make batch changes to both content and formatting and then save normally. You can forego commercial batch editor tools like BinaryMark (not freeware).
  • Because ODT moves easily into DOCX and is very good about maintaining formatting, this is a way to save on formatting headaches.
Issues

As pure text files, the content is not compressed and images are saved as very long strings of characters rather than an embeded file, so even short FODT files can be 20 pages of text and can take up a lot of disk space. You can manage this in Windows with NTFS compression.

Rich Text File similarities

Those familiar with RTF files know that you can open them inside a text editor, but the tagging protocol is such a mess, it's hard to see what the file is supposed to say. LibreOffice is managed by an open document protocol so the content and formatting are still complex, but decipherable. Much moreso than Word.

---

More on FOTD: https://www.reddit.com/r/libreoffice/co ... ices_fodt/

User avatar
Midas
Posts: 6705
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: Edit LibreOffice format file internals

#5 Post by Midas »

Although I don't use LibreOffice, this might just be the tip that tips me into it, as I've always been a big fan of human-readable formats (not that XML is easy to read, but is readable). 8)

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: Edit LibreOffice format file internals

#6 Post by webfork »

Midas wrote: Fri Feb 26, 2021 10:28 am Although I don't use LibreOffice, this might just be the tip that tips me into it, as I've always been a big fan of human-readable formats (not that XML is easy to read, but is readable). 8)
Ah fantastic -- you will be the first person (whom I know personally) that I've come close to converting. Even one of my friends who works for a company that sponsors program development has steadily maintained use of Google Docs.

User avatar
Midas
Posts: 6705
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: Edit LibreOffice format file internals

#7 Post by Midas »

Mind you, my office needs have become residual over the years -- opening the occasional mail-in, plus redacting a letter or doing CSV/spreadsheet transform every now an then, to which SM Office amply suffice.

BTW, even if I really like the idea of GDocs (or online suites, to be more precise), I never got into them seriously.

Post Reply