In: Computer Science
Question 1: Provide a detailed analysis of Huffman's algorithm, taking care to state any assumptions.
(a) What problem is solved by Huffman coding?
(b) What key property of the input does Huffman coding
exploit?
(c) Write pseudo-code for constructing a Huffman tree from a table
of character frequencies.
(d) Analyze the complexity of this algorithm.
(e) Write C struct definitions for the data structures used by the
algorithm.
Question 2: Name and briefly describe a good algorithm for solving each of the following problems:
(a) searching in a sorted linked list
(b) sorting an array
(c) finding a pattern in a string
(d) minimal spanning tree of a weighted connected graph
(e) Hamiltonian circuit of a graph
Question 3: Suppose that a particular divide-and-conquer algorithm solves a problem of size n by splitting it into three problems of size n/3. Suppose that it must perform two basic operations in order to perform this split. Suppose it performs no basic operations when n = 1. Write down the recurrence relation for the time taken by this algorithm, then solve the recurrence using a method of your choice, showing all steps.
1)Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression.The output from the algorithm can be viewed as a variable-length code table for encoding a source symbol (such as a character in a file). This table is derived by the algorithm from the estimated probability or frequency of occurrence (weight) for each possible value of the source symbo. Although optimal among methods encoding symbols separately, Huffman coding is not always optimal among all compression methods - it is replaced with arithmetic coding or asymmetric numeral systems if better compression ratio is required.
a)
Huffman coding is the base for all data compression and encoding schemes.It follows a Greedy approach and is is a famous algorithm used for lossless data encoding. It deals with generating minimum length prefix-free binary codes.It uses variable-length encoding scheme for assigning binary codes to characters depending on how frequently they occur in the given text.The character which occurs most frequently gets the smallest code.The character which occurs least frequently gets the largest code.For encoding symbols/characters that appear more frequently in the input string shorter binary codes are generated.The binary codes generated are prefix-free.The main property of the algorithm is that it does not work with different denominations.
b)
The input with the following properties are exploited by the Huffman coding:
input with set of symbols to be transmitted or stored along with their frequencies/ probabilities/ weights
Alphabet
, which is the symbol alphabet of size
.
Tuple
, which is the tuple of the (positive) symbol weights
(usually proportional to probabilities), i.e.
.
c)
int bits; TreeNode current = root; // root of tree, constructed from header data while (true)
{
int bits = input.readBits(1); if (bits == -1)
{
throw new HuffException("bad input, no PSEUDO_EOF");
}
else
{
// use the zero/one value of the bit read
// to traverse Huffman coding tree
// if a leaf is reached, decode the character and print UNLESS
// the character is pseudo-EOF, then decompression done
if (bits == 0) current = current.left; // read a 0, go left
else // read a 1, go right
if (current.left == null && current.right == null)
{
// at leaf! if (leaf-node stores pseudo-eof char)
break; // out of loop
else
{
output.writeBits(... leaf-node value);
current = root; // start back after leaf
}
}
}
}
d)
The time complexity analysis of Huffman Coding is for extractMin( ) is called 2 x (n-1) times if there are n nodes.As extractMin( ) calls minHeapify( ), it takes O(logn) time.
Thus time complexity of Huffman Coding becomes O(nlogn).
n, here is the number of unique characters in the given text.