Question

In: Computer Science

Describe the compression/archive utilities listed below: •   7-zip •   gzip •   rar •   tar •   zip...

Describe the compression/archive utilities listed below:
•   7-zip
•   gzip
•   rar
•   tar
•   zip

For each utility, do the following:
•   Write a description of how the utility operates
•   List the most common arguments & describe the effect of that argument
•   Briefly describe any compression algorithms implemented by the utilities

Create a summary table that compares and contrasts the different utilities.

Cite any references you will use to create the paper (in APA format).

PLEASE don't copy from other Chegg answers. Thanks!

Solutions

Expert Solution

An archive is a collection of historical records that are kept for long-term retention and used for future reference. Typically, archives contain data that is not actively used.

A backup is a copy of a data set, while an archive holds original data that has been removed from its original location (Dorion 2008).

There is not much difference between the two, so the main point of an archive is to hold data that is meant to be removed from its active state.

TAR
Tar is an archive utility, that simply stacks files sequentially from input or as arguments in a single file, with a small payload at the end to store file structure and start/end of each file within the archive. Developed for UNIX in 1979 for tape archive recorders (Deutsch and Aladdin Enterprises 1996). Cannot span files, cannot compress, cannot encrypt, cannot backup unnamed pipes. Compression is achieved on the fly by piping the output with compress or gzip. Modern versions of tar are now linked to local UX compressors such as compress or gzip, and there are many forks such as gtar (GNU tar). It’s confusing because most of the time the tar command keeps the same name and one cannot guess its capabilities unless one prints out its usage with tar --help.

Option arguments do not always require dashes (“Tar(5) — Format Of Tape Archive Files” 2004).

tar usage (truncated)
Usage: tar [OPTION...] [FILE]...
GNU 'tar' saves many files together into a single tape or disk archive, and can
restore individual files from the archive.

Examples:
tar -cf archive.tar foo bar # Create archive.tar from files foo and bar.
tar -tvf archive.tar # List all files in archive.tar verbosely.
tar -xf archive.tar # Extract all files from archive.tar.

Main operation mode:

-A, --catenate, --concatenate append tar files to an archive
-c, --create create a new archive
-d, --diff, --compare find differences between archive and file system
--delete delete from the archive (not on mag tapes!)
-r, --append append files to the end of an archive
-t, --list list the contents of an archive
--test-label test the archive volume label and exit
-u, --update only append files newer than copy in archive
-x, --extract, --get extract files from an archive

Operation modifiers:

--check-device check device numbers when creating incremental
archives (default)
-g, --listed-incremental=FILE handle new GNU-format incremental backup
-G, --incremental handle old GNU-format incremental backup
--ignore-failed-read do not exit with nonzero on unreadable files
--level=NUMBER dump level for created listed-incremental archive
-n, --seek archive is seekable
--no-check-device do not check device numbers when creating
incremental archives
--no-seek archive is not seekable
--occurrence[=NUMBER] process only the NUMBERth occurrence of each file
in the archive; this option is valid only in
conjunction with one of the subcommands --delete,
--diff, --extract or --list and when a list of
files is given either on the command line or via
the -T option; NUMBER defaults to 1
--sparse-version=MAJOR[.MINOR]
set version of the sparse format to use (implies
--sparse)
-S, --sparse handle sparse files efficiently

tar examples
Output file list piped to tar:

<command that produce file list> | tar cf backupfile.tar -

list content of tar file:

tar tvf backupfile.tar

backup recursively a directory:

tar cf backupfile.tar /path/to/backup

backup recursively a directory + gzip max compression:

tar cf - /path/to/compress | gzip -9 > backupfile.tar.gz

backup with gzip and rename directories inside the backup file:

tar zcvf backupfile.tar.gz --transform=s/path2rename/newName/ path2backup

ZIP
Zip is created in 1989 by Phil Katz to replace other concurrent formats such as ARC, traditionally uses DEFLATE compression (just like gzip). It has been greatly developed, maintained, and openly documented by PKWARE (PKWARE 2017). It is a de facto industry standard, and handles now more algorithms, supports file spanning and encryption (Zip-Crypto and AES). It’s the most popular file format used around the world, because DEFLATE is so fast when creating archives. It has been implemented in OS like MacOS and Windows to create on-the-fly compressed directories in their explorer. It’s primary behavior makes it a backup utility (Adler 2008).

Algorithms:

Deflate Standard LZ77-based algorithm
Deflate64 Standard LZ77-based algorithm
BZip2 Standard BWT algorithm
PPMD Dmitry Shkarin’s PPMdH with small changes
LZMA Improved and optimized version of LZ77 algorithm
zip usage
UX zip utility is divided in 2 binaries: zip to compress, unzip to decompress:

Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
Zip 3.0 (July 5th 2008). Usage:
zip [-options] [-b path] [-t mmddyyyy] [-n suffixes] [zipfile list] [-xi list]
The default action is to add or replace zipfile entries from list, which
can include the special name - to compress standard input.
If zipfile and list are omitted, zip compresses stdin to stdout.
-f freshen: only changed files -u update: only changed or new files
-d delete entries in zipfile -m move into zipfile (delete OS files)
-r recurse into directories -j junk (don't record) directory names
-0 store only -l convert LF to CR LF (-ll CR LF to LF)
-1 compress faster -9 compress better
-q quiet operation -v verbose operation/print version info
-c add one-line comments -z add zipfile comment
-@ read names from stdin -o make zipfile as old as latest entry
-x exclude the following names -i include only the following names
-F fix zipfile (-FF try harder) -D do not add directory entries
-A adjust self-extracting exe -J junk zipfile prefix (unzipsfx)
-T test zipfile integrity -X eXclude eXtra file attributes
-y store symbolic links as the link instead of the referenced file
-e encrypt -n don't compress these suffixes
-h2 show more help

unzip usage
UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.

Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
Default action is to extract files in list, except those in xlist, to exdir;
file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).

-p extract files to pipe, no messages -l list files (short format)
-f freshen existing files, create none -t test compressed archive data
-u update files, create if necessary -z display archive comment only
-v list verbosely/show version info -T timestamp archive to latest
-x exclude files that follow (in xlist) -d extract files into exdir
modifiers:
-n never overwrite existing files -q quiet mode (-qq => quieter)
-o overwrite files WITHOUT prompting -a auto-convert any text files
-j junk paths (do not make directories) -aa treat ALL files as text
-U use escapes for all non-ASCII Unicode -UU ignore any Unicode fields
-C match filenames case-insensitively -L make (some) names lowercase
-X restore UID/GID info -V retain VMS version numbers
-K keep setuid/setgid/tacky permissions -M pipe through "more" pager
See "unzip -hh" or unzip.txt for more help. Examples:
unzip data1 -x joe => extract all files except joe from zipfile data1.zip
unzip -p foo | more => send contents of foo.zip via pipe into program more
unzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer

zip examples
Create an archive:

zip archive.zip file(s)

Create an archive recursively

zip -r archive.zip directory(s)

List archive content:

unzip -l archive.zip

Check integrity of an archive:

zip -T archive.zip

unzip -t archive.zip

Extract an archive:

unzip archive.zip

Update (refresh) files inside an archive:

zip -u archive.zip file(s)

Zip list of files returned from the find command:

find . -name "pattern" -print | zip archive.zip -@

gzip
Gzip is an inline compression utility, that compresses redirects from terminal, or that compresses files in place. Algorithm used is DEFLATE, developed in 1992 to replace the compress program from early UNIX systems (J.-L. Gailly 2003). “G” stands for GNU, superseded by the Free Software Movement (Free Software Foundation 2018). Because of its on-the-fly abilities, this format is used today as the standard HTTP compression offered by every web servers available today (J. Gailly 2017). It doesn’t supports file spanning. It’s primary behavior (in-place compression) makes it an archive utility.

Algorithms:

Deflate Standard LZ77-based algorithm

gzip Usage
Usage: gzip [OPTION]... [FILE]...
Compress or uncompress FILEs (by default, compress FILES in-place).

Mandatory arguments to long options are mandatory for short options too.

-c, --stdout write on standard output, keep original files unchanged
-d, --decompress decompress
-f, --force force overwrite of output file and compress links
-h, --help give this help
-k, --keep keep (don't delete) input files
-l, --list list compressed file contents
-L, --license display software license
-n, --no-name do not save or restore the original name and time stamp
-N, --name save or restore the original name and time stamp
-q, --quiet suppress all warnings
-r, --recursive operate recursively on directories
-S, --suffix=SUF use suffix SUF on compressed files
-t, --test test compressed file integrity
-v, --verbose verbose mode
-V, --version display version number
-1, --fast compress faster
-9, --best compress better
--rsyncable Make rsync-friendly archive

With no FILE, or when FILE is -, read standard input.

gzip Examples
Compress a file in place and delete original file (produces file.gz):

gzip file

Compress file to file.gz and keep original:

gzip -k file

gzip -c file > file.gz

Decompress an archive and remove it:

gzip -d file.gz

Compress output of an sqldump into a gziped file:

mysqldump --opt <database> | gzip -c > database.sql.gz

RAR
Another backup compression utility developed by a Russian engineer in 1993, with a proprietary, licensed algorithm (win.rar GmbH 2018). It’s been updated over time so it can decompress a variety of format. Supports encryption and file spanning (win.rar 2018). On UX systems, rar utilities are split in two: while unrar is publically available, installing rar requires an additional proprietary repository, which few users care about since xz and 7zip are free and preferred. Because of its license, rar usage is clearly plummeting overall.

Algorithms:

RAR proprietary
LZSS Lempel-Ziv
Deflate Standard LZ77-based algorithm

Unrar usage
UNRAR 5.30 beta 4 freeware Copyright (c) 1993-2015 Alexander Roshal

Usage: unrar <command> -<switch 1> -<switch N> <archive> <files...>
<@listfiles...> <path_to_extract\>

<Commands>
e Extract files without archived paths
l[t[a],b] List archive contents [technical[all], bare]
p Print file to stdout
t Test archive files
v[t[a],b] Verbosely list archive contents [technical[all],bare]
x Extract files with full path

<Switches>
- Stop switches scanning
@[+] Disable [enable] file lists
ad Append archive name to destination path
ag[format] Generate archive name using the current date
ai Ignore file attributes
ap<path> Set path inside archive
c- Disable comments show
cfg- Disable read configuration
cl Convert names to lower case
cu Convert names to upper case
dh Open shared files
ep Exclude paths from names
ep3 Expand paths to full including the drive letter
f Freshen files
id[c,d,p,q] Disable messages
ierr Send all messages to stderr
inul Disable all messages
kb Keep broken extracted files
n<file> Additionally filter included files
n@ Read additional filter masks from stdin
n@<list> Read additional filter masks from list file
o[+|-] Set the overwrite mode
ol[a] Process symbolic links as the link [absolute paths]
or Rename files automatically
ow Save or restore file owner and group
p[password] Set password
p- Do not query password
r Recurse subdirectories
sc<chr>[obj] Specify the character set
sl<size> Process files with size less than specified
sm<size> Process files with size more than specified
ta<date> Process files modified after <date> in YYYYMMDDHHMMSS format
tb<date> Process files modified before <date> in YYYYMMDDHHMMSS format
tn<time> Process files newer than <time>
to<time> Process files older than <time>
ts<m,c,a>[N] Save or restore file time (modification, creation, access)
u Update files
v List all volumes
ver[n] File version control
vp Pause before each volume
x<file> Exclude specified file
x@ Read file names to exclude from stdin
x@<list> Exclude files listed in specified list file
y Assume Yes on all queries

Winrar examples
On windows, installing the Winrar software also gives access to some command line utilities that can compress and decompress using RAR algorithm. unrar command uses the same arguments as on UX systems:

Create a zip backup:

winrar a -afzip backup.zip file(s)

Create a rar backup:

winrar a -r backup.rar file(s)

Create a rar backup recursively:

winrar a -r backup.rar path

Test backup integrity:

unrar t backup.rar

List backup content:

unrar va backup.rar

Extract specific files in a backup file + directory structure:

unrar x backup.rar *.ext [extractfolder\]

Extract backup without directory structure:

unrar e backup.rar [extractfolder\]

7-zip
7zip is a modern, free (Pavlov 2018a) backup compression utility developed in 1999 by another Russian engineer. It uses a variety of algorithms to compress and decompress many backup formats including tar. It can only unRAR archives due to licensing (Pavlov 2018b). Its flagship algorithm is LZMA, and today LZMA2 (Pavlov 2018c). It offers the best compression ratio available on the market, at the cost of speed though. It uses a huge dictionary size with new coding techniques, which explains the ratio, and also why it’s so slow while compressing. Surprisingly, decompression is almost as fast as DEFLATE. Supports file spanning and AES encryption.

One can install p7zip for Linux, but there exists a more common utility called xz that also uses LZMA/LZMA2 compression (Collin 2016). Just like gzip, it compresses data streams: no ability to store multiple files (Himanshu 2012).

Algorithms:

LZMA2 Improved version of LZMA
LZMA Improved and optimized version of LZ77 algorithm
PPMD Dmitry Shkarin’s PPMdH with small changes
BZip2 Standard BWT algorithm
BCJ Converter for 32-bit x86 executables
BCJ2 Converter for 32-bit x86 executables
Deflate Standard LZ77-based algorithm
p7zip: Differences between 7z, 7za and 7zr binaries
The package p7zip usually includes three binaries, 7z, 7za, and 7zr (Vinet and Griffin 2017). Their differences are:

7z: 7z uses plugins to handle archives.
7za: is a stand-alone executable. 7za handles fewer archive formats than 7z, but does not need any other plugin.
7zr: is a stand-alone executable. 7zr does not need any other plugin, and is a light-version of 7za that only handles 7z archives.

7zip Usage
7-Zip (a) 9.38 beta Copyright (c) 1999-2014 Igor Pavlov 2015-01-03
p7zip Version 9.38.1 (locale=C,Utf16=off,HugeFiles=on,2 CPUs)

Usage: 7za <command> [<switches>...] <archive_name> [<file_names>...]
[<@listfiles...>]

<Commands>
a : Add files to archive
b : Benchmark
d : Delete files from archive
e : Extract files from archive (without using directory names)
h : Calculate hash values for files
l : List contents of archive
rn : Rename files in archive
t : Test integrity of archive
u : Update files to archive
x : eXtract files with full paths
<Switches>
-- : Stop switches parsing
-ai[r[-|0]]{@listfile|!wildcard} : Include archives
-ax[r[-|0]]{@listfile|!wildcard} : eXclude archives
-bd : Disable percentage indicator
-i[r[-|0]]{@listfile|!wildcard} : Include filenames
-m{Parameters} : set compression Method
-o{Directory} : set Output directory
-p{Password} : set Password
-r[-|0] : Recurse subdirectories
-scs{UTF-8|UTF-16LE|UTF-16BE|WIN|DOS|{id}} : set charset for list files
-sfx[{name}] : Create SFX archive
-si[{name}] : read data from stdin
-slt : show technical information for l (List) command
-so : write data to stdout
-ssc[-] : set sensitive case mode
-t{Type} : Set type of archive
-u[-][p#][q#][r#][x#][y#][z#][!newArchiveName] : Update options
-v{Size}[b|k|m|g] : Create volumes
-w[{path}] : assign Work directory. Empty path means a temporary directory
-x[r[-|0]]]{@listfile|!wildcard} : eXclude filenames
-y : assume Yes on all queries

Examples
Create a backup:

7z a backup.7z archiveDir

List backup content:

7z l backup.7z

Check integrity of a backup:

7z t backup.7z

Extract a backup:

7z e backup.7z

Update (refresh) files inside a backup:

7z u backup.7z backupDir

Create a max compression LZMA backup with tar and xz on the fly:

tar -cf - path/ | xz -9 -c - > archive.tar.xz

Deflate Algorithm

It's important before trying to understand DEFLATE to understand the other two compression strategies that make it up -- Huffman coding and LZ77 compression.

Huffman coding

Huffman coding is a form of prefix coding, which you may not think you know. But you've almost certainly used a prefix code -- when using the phone. Starting from the dial tone, you press a sequence of what may be five, seven, eight, eleven, twelve, or some other number of keys -- and each sequence of keys reaches another specific phone line.

Now suppose you're in an office setting with an internal switchboard, as many large companies do. All other phones within the bank only require five numbers to dial, instead of seven -- that's because it's expected that you'll be calling those numbers more often. You may still need to call other numbers, though -- so all of those numbers have a `9' added to the front.

That's a prefix code. Each element that you could want to specify has a code made up of numbers, and because no code for one element begins with the code for any other element, you can type in that code and there will be no ambiguity about that being the one you mean.

A Huffman code is a prefix code prepared by a special algorithm. Here, instead of each code being a series of numbers between 0 and 9, each code is a series of bits, either 0 or 1. Instead of each code representing a phone, each code represents an element in a specific ``alphabet'' (such as the set of ASCII characters, which is the primary but not the only use of Huffman coding in DEFLATE).

A Huffman algorithm starts by assembling the elements of the ``alphabet,'' each one being assigned a ``weight'' -- a number that represents its relative frequency within the data to be compressed. These weights may be guessed at beforehand, or they may be measured exactly from passes through the data, or some combination of the two. In any case, the elements are selected two at a time, the elements with the lowest weights being chosen. The two elements are made to be leaf nodes of a node with two branches (I really hope you know nodes and trees...) Anyway, suppose we had a set of elements and weights that looked like this:

    A    16
    B    32
    C    32
    D     8
    E     8

We would pick D and E first, and make them branches of a single node -- one being the `0' branch, and one the `1' branch.

     ( )
  0 /   \ 1
   D     E

At this point, no element has been given its complete code yet, but we now know that the codes for D and E will be exactly the same, except for the last binary digit: D will end in 0 and E in 1.

The combined node D-and-E is placed back with the other (as yet) uncombined elements, and given a weight equal to the sum of its leaf nodes: in this case, 8 + 8 = 16. Again, we take the two nodes with lowest weight, which are A, and D-and-E, and join them into a larger node.

       ( )
    0 /   \ 1
    ( )    A
 0 /   \ 1
  D     E

This time, when the node A-D-E is put back into the list of elements, all remaining elements have the same weight, 32. Which two of the three are selected to be combined first is not important, at least not in the classical Huffman algorithm.

When all nodes have been recombined into a single ``Huffman tree,'' then by starting at the root and selecting 0 or 1 at each step, you can reach any element in the tree. Each element now has a Huffman code, which is the sequence of 0's and 1's that represents that path through the tree.

Now, it should be fairly easy to see how such a tree, and such a set of codes, could be used for compression. If compressing ordinary text, for example, probably more than half of the ASCII character set could be left out of the tree altogether. Frequently used characters, like `E' and `T' and `A,' will probably get much shorter codes, and even if some codes are actually made longer, they will be the ones that are used less often.

However, there is also the question: how do you pass the tree along with the encoded data? It turns out that there is a fairly simple way, if you modify slightly the algorithm used to generate the tree.

In the classic Huffman algorithm, a single set of elements and weights could generate multiple trees. In the variation used by the Deflate standard, there are two additional rules: elements that have shorter codes are placed to the left of those with longer codes. (In our previous example, D and E wind up with the longest codes, and so they would be all the way to the right.) Among elements with codes of the same length, those that come first in the element set are placed to the left. (If D and E end up being the only elements with codes of that length, then D will get the 0 branch and E the 1 branch, as D comes before E.)

It turns out that when these two restrictions are placed upon the trees, there is at most one possible tree for every set of elements and their respective codelengths. The codelengths are all that we need to reconstruct the tree, and therefore all that we need to transmit.

LZ77 compression

LZ77 compression works by finding sequences of data that are repeated. The term ``sliding window'' is used; all it really means is that at any given point in the data, there is a record of what characters went before. A 32K sliding window means that the compressor (and decompressor) have a record of what the last 32768 (32 * 1024) characters were. When the next sequence of characters to be compressed is identical to one that can be found within the sliding window, the sequence of characters is replaced by two numbers: a distance, representing how far back into the window the sequence starts, and a length, representing the number of characters for which the sequence is identical.

I realize this is a lot easier to see than to just be told. Let's look at some highly compressible data:

        Blah blah blah blah blah!

Our datastream starts by receiving the following characters: `B,' `l,' `a,' `h,' ` ,' and `b.' However, look at the next five characters:

         vvvvv
        Blah blah blah blah blah!
              ^^^^^

There is an exact match for those five characters in the characters that have already gone into the datastream, and it starts exactly five characters behind the point where we are now. This being the case, we can output special characters to the stream that represent a number for length, and a number for distance.

The data so far:

        Blah blah b

The compressed form of the data so far:

        Blah b[D=5,L=5]

The compression can still be increased, though to take full advantage of it requires a bit of cleverness on the part of the compressor. Look at the two strings that we decided were identical. Compare the character that follows each of them. In both cases, it's `l' -- so we can make the length 6, and not just five. But if we continue checking, we find the next characters, and the next characters, and the next characters, are still identical -- even if the so-called ``previous'' string is overlapping the string we're trying to represent in the compressed data!

It turns out that the 18 characters that start at the second character are identical to the 18 characters that start at the seventh character. It's true that when we're decompressing, and read the length, distance pair that describes this relationship, we don't know what all those 18 characters will be yet -- but if we put in place the ones that we know, we will know more, which will allow us to put down more... or, knowing that any length-and-distance pair where length > distance is going to be repeating (distance) characters again and again, we can set up the decompressor to do just that.

It turns out our highly compressible data can be compressed down to just this:

        Blah b[D=5, L=18]!

Putting it all together

The deflate compressor is given a great deal of flexibility as to how to compress the data. The programmer must deal with the problem of designing smart algorithms to make the right choices, but the compressor does have choices about how to compress data.

There are three modes of compression that the compressor has available:

  1. Not compressed at all. This is an intelligent choice for, say, data that's already been compressed. Data stored in this mode will expand slightly, but not by as much as it would if it were already compressed and one of the other compression methods was tried upon it.
  2. Compression, first with LZ77 and then with Huffman coding. The trees that are used to compress in this mode are defined by the Deflate specification itself, and so no extra space needs to be taken to store those trees.
  3. Compression, first with LZ77 and then with Huffman coding with trees that the compressor creates and stores along with the data.

The data is broken up in ``blocks,'' and each block uses a single mode of compression. If the compressor wants to switch from non-compressed storage to compression with the trees defined by the specification, or to compression with specified Huffman trees, or to compression with a different pair of Huffman trees, the current block must be ended and a new one begun.

The details of how the LZ77 and Huffman work together need some closer examination. Once the raw data has been turned into a string of characters and special length, distance pairs, these elements must be represented with Huffman codes.

Though this is NOT, repeat, NOT standard terminology, call the point where we start reading in bits a ``dial tone.'' After all, in our analogy, the dial tone is where you can start specifying a series of numbers that will end up mapping to a specific phone. So call the very beginning a ``dial tone.'' At that dial tone, one of three things could follow: a character, a length-distance pair, or the end of the block. Since we must be able to tell which it is, all the possible characters (``literals''), elements that indicate ranges of possible lengths (``lengths''), and a special end-of-block indicator are all merged into a single alphabet. That alphabet then becomes the basis of a Huffman tree. Distances don't need to be included in this alphabet, since they can only appear directly after lengths. Once the literal has been decoded, or the length-distance pair decoded, we are at another ``dial-tone'' point and we start reading again. If we got the end-of-block symbol, of course, we're either at the beginning of another block or at the end of the compressed data.

Length codes or distance codes may actually be a code that represents a base value, followed by extra bits that form an integer to be added to the base value.

Probably the trickiest part of the DEFLATE specification to understand is the way trees are encoded to go along with the data, when that data is compressed with specialized trees.

The trees are transmitted by their codelengths, as previously discussed. The codelengths are put all together into a sequence of numbers between 0 and 15 (the Huffman trees that are created must be kept to codelengths of no more than 15; this is the tricky part, not the part about constraining the order of the elements).

Not all the elements have to be given codelengths; if the last elements of an alphabet are of 0 codelengths, they can and probably should be left out. The number of elements in each of the two alphabets will be transmitted, so the trimmed alphabets go together into a single sequence.

Once this sequence of codelengths is assembled, it is compressed with a form of what is called run-length compression. When several elements in a row have the same codelength (often 0) special symbols may be used to indicate the number of elements with this codelength. Our sequence is now a sequence of numbers between 0 and 18 (possibly with extra bits forming integers to modify base values, as was the case with the length and distance codes).

A Huffman tree is created for this alphabet of 0-18. Sigh. The sequence of 0-18 codes and extra bits is prepared with the Huffman codes replacing the 0-18 elements.

This Huffman tree needs to be included with the data, of course. Like the other Huffman trees, it will be included by recording the codelengths of the elements. However... once again, *SIGH*. They are not recorded in the order 0, 1, 2, 3, 4 ... 16, 17, 18. They are recorded in a special order: 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15. The logic behind this is that the elements near the end of sequence are most likely to have 0 codelengths (i.e. not be in the tree at all). If they are indeed 0, they can be trimmed from the end of the sequence; the number of elements that are being directly recorded is also included in the data.

Those are the trouble points, I think, of the DEFLATE specs. Along with the specification itself and a bit of time, it should hopefully clear things up.


Related Solutions

1. Tar in cigarettes: Listed below are amounts of tar (mg per cigarette) in sing size...
1. Tar in cigarettes: Listed below are amounts of tar (mg per cigarette) in sing size cigarettes. 100-mm menthol cigarettes, and 100-mm non menthol cigarettes. The king size cigarettes are nonfiltered, nonmenthol, and nonlight. The 100-mm menthol cigarettes are filtered and nonlight. The 100-mm nonmenthol cigarettes are filtered and nonlight. Use a .05 significance level to test the claim that the three categories of cigarettes yield the same mean amount of tar. Given that only the king-size cigarettes are not...
3. Tar in cigarettes: Listed below are amounts of tar (mg per cigarette) in sing size...
3. Tar in cigarettes: Listed below are amounts of tar (mg per cigarette) in sing size cigarettes. 100-mm menthol cigarettes, and 100-mm non menthol cigarettes. The king size cigarettes are nonfiltered, nonmenthol, and nonlight. The 100-mm menthol cigarettes are filtered and nonlight. The 100-mm nonmenthol cigarettes are filtered and nonlight. Use a .05 significance level to test the claim that the three categories of cigarettes yield the same mean amount of tar. Given that only the king-size cigarettes are not...
Based on the following information, complete the required calculations listed below: Factory utilities: $ 15,000 Direct...
Based on the following information, complete the required calculations listed below: Factory utilities: $ 15,000 Direct labor: $ 75,000 Depreciation on factory equipment: $ 13,000 Sales salaries: $ 45,000 Depreciation on delivery trucks: $ 5,000 Property taxes on factory building: $ 4,000 Indirect factory labor: $ 50,000 Repairs to office equipment: $ 2,000 Indirect materials: $ 80,000 Sales revenue: $ 350,000 Direct materials used: $ 140,000 Advertising: $ 16,000 Factory manager's salary: $ 8,000 Office supplies used: $ 3,000 Assume...
An ideal Otto cycle has a compression ratio of 7. At the beginning of the compression...
An ideal Otto cycle has a compression ratio of 7. At the beginning of the compression process, air is at 98 kPa, 30oC and 766 kJ/kg of heat is transferred to air during the constant-volume heat addition process. Determine (a) the pressure (p3) and temperature (T3) at the end of the heat addition process, (b) the net work output, (c) the thermal efficiency and (d) the mean effective pressure for the cycle. Use the IG model
Question 7 of 21Financial ratio data is listed below for Gallery of Dreams. Select the answers...
Question 7 of 21Financial ratio data is listed below for Gallery of Dreams. Select the answers that are strengths for the firm after analyzing the ratios. Gallery of Dreams Ratios Ratio Industry 2015 2014 2013 Current 2.50x 4.48x 4.06x 3.48x Quick 0.80x 1.47x 1.18x 0.96x Average collection period 11 days 16 days 15 days 9 days Inventory turnover 2.30x 1.19x 1.24x 1.37x Days payable outstanding 15 days 11 days 12 days 8 days Fixed asset turnover 17.50x 9.74x 9.09x 8.85x...
5 Briefly describe the functions of the parts of the brain listed in the table below.
5 Briefly describe the functions of the parts of the brain listed in the table below.6 List the parts some of the cranial nerves innervate and the functions they control:What would happen if Vagus (NX) nerve is cut?Which of the above listed nerves are sensory only?Which of the above listed nerves are mixed nerves?
Listed below are 5 terms followed by a list of phrases that describe or characterize the...
Listed below are 5 terms followed by a list of phrases that describe or characterize the terms. Match each phrase with the correct term. Phrase: 1. Presented fairly in conformity with GAAP. 2. The larger the better from a debt holder's perspective. 3. Expenses incurred but not yet paid. 4. Supported by a negotiable instrument. 5. Will be satisfied in the next year or the operating cycle, whichever is longer. Terms: Accrued liabilities Times interest earned ratio Unqualified opinion Notes...
The magnitudes and depths of 7 earthquakes are listed below. Depth 7.0 7.0 7.1 7.1 6.8...
The magnitudes and depths of 7 earthquakes are listed below. Depth 7.0 7.0 7.1 7.1 6.8 7.2 6.9 Magnitude 2.44 1.97 2.32 2.78 1.94 2.40 2.12 (a) Determine the correlation of these two variables. (b) Generate a scatterplot for these variables. Use depth as the explanatory variable. (c) In just a sentence or two, explain why it’s appropriate to construct a regression line for this data. (d) Find the equation of the least-squares regression line. Add the line to your...
6 and 7. Listed below are heights (inches) and shoe length (inches) of 12 women. Use...
6 and 7. Listed below are heights (inches) and shoe length (inches) of 12 women. Use a 0.05 significance level to test the claim that there is a linear correlation between heights and shoe length. Height       67 66 64 64 67 64 68 65 68 65 66 61 Shoe Length 10 7 7 9 8 8.5 8.5 8.5 9 8 7.5 6 USE A 0.05 SIGNIFICANCE LEVEL TO TEST THE CLAIM: H0: ρ = 0 H1: ρ ≠ 0 Test...
6 and 7. Listed below are heights (inches) and shoe length (inches) of 12 women. Use...
6 and 7. Listed below are heights (inches) and shoe length (inches) of 12 women. Use a 0.05 significance level to test the claim that there is a linear correlation between heights and shoe length. Height       67 66 64 64 67 64 68 65 68 65 66 61 Shoe Length 10 7 7 9 8 8.5 8.5 8.5 9 8 7.5 6 USE A 0.05 SIGNIFICANCE LEVEL TO TEST THE CLAIM: H0: ρ = 0 H1: ρ ≠ 0 Test...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT