c - Zlib minimum deflate size -


i'm trying figure out if there's way calculate minimum required size output buffer, based on size of input buffer.

this question similar zlib, deflate: how memory allocate?, not same. asking each chunk in isolation, rather entire stream.

so suppose have 2 buffers: input , output, , have buffer_size, - say, 4096 bytes. (just convenient number, no particular reason choose size.)

if deflate using:

deflate(stream, z_partial_flush) 

so each chunk compressed, , flushed output buffer, there way can guarantee i'll have enough storage in output buffer without needing reallocate?

superficially, we'd assume deflated data larger uncompressed input data (assuming use compression level greater 0.)

of course, that's not case - small values. example, if deflate single byte, deflated data larger uncompressed data, due overhead of things headers , dictionaries in lzw stream.

thinking how lzw works, seem if our input data @ least 256 bytes (meaning worst case scenario, every single byte different , can't compress anything), should realize input size less 256 bytes + zlib headers potentially require larger output buffer.

but, generally, realworld applications aren't going compressing small sizes that. assuming input/output buffer of more 4k, there way guarantee output compressed data smaller input data?

(also, know deflatebound, rather avoid because of overhead.)

or, put way, there minimum buffer size can use input/output buffers guarantee output data (the compressed stream) smaller input data? or there pathological case can cause output stream larger input stream, regardless of size?

though can't quite make heads or tails out of question, can comment on parts of question in isolation.

is there way guarantee output compressed data smaller input data?

absolutely not. possible compressed output larger input. otherwise wouldn't able compress other input.

(also, know deflatebound, rather avoid because of overhead.)

overhead? seriously? we're talking fraction of percent larger input buffer reasonable sizes.

by way, deflatebound() provides bound on size of entire output stream function of size of entire input stream. can't when in middle of bunch of deflate() calls incomplete input , insufficient output space. example, may still have deflate output pending , delivered next deflate() call, without providing new input @ all. expansion ratio infinite isolated call.

due overhead of things headers , dictionaries in lzw stream.

deflate not lzw. approach uses called lz77. different lzw, obsolete. there no "dictionaries" stored in compressed deflate data. "dictionary" uncompressed data precedes data being compressed or decompressed.

or, put way, there minimum buffer size can use input/output buffers ...

the whole idea behind zlib interface not have worry fit in buffers. keep calling deflate() or inflate() more input data , more output space until you're done, , well. not matter if need make more 1 call consume 1 buffer of input, or more 1 call fill 1 buffer of output. have loops make more calls, provide more input when needed, , disposition output when needed , provide fresh output space.


Comments

Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -