Perhaps I should make it clear that ZIP (well, Lempel-Ziv in fact) compression is universal - you can prove that for a sufficiently long text it will be as close to the maximum comporession as you want.
The real issue: 1. working with small files (you can determine a good encoding not based on the text itself, and otherwise reduce the overhead of ZIP) 2. Compressing the data while keeping the structure available in 'clear-text' so that it could be used without decompressing (this can be done by deciding on fixed encodings for the structural elements).
|