++ed by:
1 non-PAUSE user
Author image Martin Schwartz


lclean - Detect and clean trash in Structured Storage documents


lclean -r || -c || -s || -i || -l || -e document

Note! If you use switch -c or -i, lclean changes your document! Please keep a backup of the treated documents, until you are sure they did not take any harm!


lclean deals with documents created typically with MS Windows applications. It gives a report about the trash sections in those "OLE / Structured Storage" documents, cleans this trash or saves it to files. Further more a file can be hidden into and extracted from those trash sections.


lclean -c [-n] {document}

The trash sections will be cleaned. Unused blocks are filled with null bytes. System data will be cleaned with 0xff bytes. File end trash (type 4) will be cleaned with random bytes. When using switch -n, file end trash also will be filled with zero bytes (faster on files with lots of embedded objects).


lclean -e [-f] [-z] {document}

Extracts the hidden file. This makes a copy of the hidden file. If the file got corrupted by what reason ever, it will not be extracted. The file will get the date of its last modification. If the file already exists, you will be prompted to overwrite it with the new extracted file. With switch -f you will not be prompted, but the file will be overwritten. With switch -z no zero length files will be created.


lclean -i <file> [-a] {document}

Insert a file into the document (hide it). The trash in your document will be substituted by some <file>. This file cannot be seen by any standard Windows application. The file must be smaller, than the size of the trash in your documents (plus 20 bytes plus the size to store the filename). Normally, only trash types 1 and 2 will be used for this. If they offer not enough space, with switch -a trash type 4 and 8 will be used additionally.


lclean -l {document}

List if there is a file hidden in the documents trash.


lclean -r {document}

Gets a small report about the trash in the documents.


lclean -s [-a] [-d] [-z] {document}

The trash sections will be saved to own files. They will be stored to an directory in your current directory. Normally it will be the directory "doctrash". E.g., if there is the example file "legacy.doc" the trashfiles will be stored as: "doctrash/legacy.tr1", "doctrash/legacy.tr2", "doctrash/legacy.tr3" and "doctrash/legacy.tr4".


All trash will be stored into one big file "doctrash/legacy.tra".


The trash file(s) will not be stored into directory "doctrash", but each into an own directory. E.g. into directory "legacy/".


When using switch -z, zero length files will not be created.


lclean -cs {document}

This would first save all trash chunks into separate files into directory "doctrash", then it would clean the document.

lclean -aci <file> {document}

This would first clean the document, then insert the hidden file by using all trash types.


Microsoft's first and still most spread OLE implementations had bugs. One caused that some sections of documents that actually should be filled with zero bytes contained more or less private data.

Management of OLE documents is a little bit difficult and takes some time. A way to fasten this up is not to care about the old data, but simply to add the new data to the document. Cleaning up could be done later. When switched on the "fast save" option, Microsoft Word uses this strategy. So, thus saved files contain the new and the old version of a document. The old data cannot be edited any more and stays invisible in the document.

Some programs seem to use the Microsoft OLE library not properly. For example, the Star Office 3.1. programs create documents, that always contain 1024 bytes of trash.

As far as I know, Microsoft offers a bugfix for 32 bit Windows systems, only.

The program "lclean" can access this kind of garbage in OLE documents. To do this it uses the modify_trash method of OLE::Storage. This library decides between four different types of trash.


Type 1

Unused "big blocks". These blocks are not used by the document. Each of these blocks is 512 bytes long.

Type 2

Unused "small blocks". These blocks are not used by the document. Each of these blocks is 64 bytes long.

Type 4

File end space. This refers to the "streams" of an OLE document. The space is made up of all the space between the end of a stream and the end of a block.

Type 8

System space. These sections are required by file format, though they are not used by OLE system. Actually this data is no garbage.




Martin Schwartz <schwartz@cs.tu-berlin.de>.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 823:

=back doesn't take any parameters, but you said =back =back

Around line 826:

You forgot a '=back' before '=head1'