Question

In: Mechanical Engineering

1:Find the convolution output volume size of layer 2 (W2xH2xD2) and 3 (W3xH3xD3) 2: Find the...

1:Find the convolution output volume size of layer 2 (W2xH2xD2) and 3 (W3xH3xD3)

2: Find the convolution output volume size of layer 2 (W2xH2xD2) and 3 (W3xH3xD3)

3: Find the output volume size of output pooling layer (W2xH2xD2)

Solutions

Expert Solution

1.Convolution output volume size of layer 2 (W2xH2xD2) and 3 (W3xH3xD3)

The output of a convolution layer is computed as the following:

the depth (No of feature maps) is equal to the number of filters applied in this layer

the width ( the same for height) is computed according to the following equation

W=(W?F+2P)/S+1 where f is the receptive field (filter width), p is the padding and s is the stride

  • Accepts a volume of size W1×H1×D1W1×H1×D1
  • Requires four hyperparameters:
    • Number of filters KK,
    • their spatial extent FF,
    • the stride SS,
    • the amount of zero padding PP.
  • Produces a volume of size W2×H2×D2W2×H2×D2 where:
    • W2=(W1?F+2P)/S+1W2=(W1?F+2P)/S+1
    • H2=(H1?F+2P)/S+1H2=(H1?F+2P)/S+1 (i.e. width and height are computed equally by symmetry)
    • D2=KD2=K
  • With parameter sharing, it introduces F?F?D1F?F?D1 weights per filter, for a total of (F?F?D1)?K(F?F?D1)?K weights and KK biases.
  • In the output volume, the dd-th depth slice (of size W2×H2W2×H2) is the result of performing a valid convolution of the dd-th filter over the input volume with a stride of SS, and then offset by dd-th bias.

A common setting of the hyperparameters is F=3,S=1,P=1F=3,S=1,P=1. However, there are common conventions and rules of thumb that motivate these hyperparameters.

It is common to periodically insert a Pooling layer in-between successive Conv layers in a ConvNet architecture. Its function is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network, and hence to also control overfitting. The Pooling Layer operates independently on every depth slice of the input and resizes it spatially, using the MAX operation. The most common form is a pooling layer with filters of size 2x2 applied with a stride of 2 downsamples every depth slice in the input by 2 along both width and height, discarding 75% of the activations. Every MAX operation would in this case be taking a max over 4 numbers (little 2x2 region in some depth slice). The depth dimension remains unchanged. More generally, the pooling layer:

  • Accepts a volume of size W1×H1×D1W1×H1×D1
  • Requires two hyperparameters:
    • their spatial extent FF,
    • the stride SS,
  • Produces a volume of size W2×H2×D2W2×H2×D2 where:
    • W2=(W1?F)/S+1W2=(W1?F)/S+1
    • H2=(H1?F)/S+1H2=(H1?F)/S+1
    • D2=D1D2=D1
  • Introduces zero parameters since it computes a fixed function of the input
  • Note that it is not common to use zero-padding for Pooling layers

It is worth noting that there are only two commonly seen variations of the max pooling layer found in practice: A pooling layer with F=3,S=2F=3,S=2 (also called overlapping pooling), and more commonly F=2,S=2F=2,S=2. Pooling sizes with larger receptive fields are too destructive.

General pooling. In addition to max pooling, the pooling units can also perform other functions, such as average pooling or even L2-norm pooling. Average pooling was often used historically but has recently fallen out of favor compared to the max pooling operation, which has been shown to work better in practice.

2.

2D Convolutional Layers constitute Convolutional Neural Networks (CNNs) along with Pooling and fully-connected layers and create the basis of deep learning. So if you want to go deeper into CNNs and deep learning, the first step is to get more familiar with how Convolutional Layers work. If you are not familiar with applying 2D filters on images, . In the image filtering post, we talked about convolving a filter with an image. In that post, we had a 2D filter kernel (a 2D matrix) and a single channel image (grayscale image). To calculate the convolution, we swept the kernel (if you remember we should flip the kernel first and then do the convolution, for the rest of this post we assumed that the kernel is already flipped) on the image and at every single location we calculated the output. In fact, the stride of our convolution was 1. You might say what is a stride? stride is the number of pixels with which we slide our filter, horizontally or vertically. In other words, in that case we moved our filter one pixel at each step to calculate the next convoluion output. However, for a convolution with stride 2, we calculate the output for every other pixel (or jump 2 pixels) and as a contrary the output of the convolution would be roughly half the size of the input image. Figure 1 compares two 2D convolutions with strides one and two, respectively.

Note that ,you can have different strides horizontally and vertically. You can use the following equations to calculate the exact size of the convolution output for an input with the size of (width = WW, height = HH) and a Filter with the size of (width = FwFw, height = FhFh):

output width=W?Fw+2P/Sw+1

output height=H?Fh+2P/Sh+1

where swsw and shsh are horizontal and vertical stride of the convolution, respectively, and PP is the amount of zero padding added to the border of the image (Look at the previous post if you are not familiar with the zero padding concept). However, the output width or height calculated from these equations might be a non-integer value. In that case, you might want to handle the situation in any way to satisfy the desired output dimention. Here, we explain how Tensorflow approachs the issue. In general you have two main options for padding scheme which determine the output size, namely 'SAME' and 'VALID' padding schemes. In 'SAME' padding scheme, in which we have zero padding, the size of output will be

output height=ceil(H/Sh)

output width=ceil(W/Sw)

If the required number of pixels for padding to have the desired output size is a even number, we can simply add half of that to each side of the input (left and rigth or up and bottom). However, if it is an odd number, we need an uneven number of zero on the left and the right sides of the input (for horizontal padding) or the top and the bottom sides of the input (for vertical padding). Here is how Tensorflow calculates required padding in each side:

padding along height=Ph=max((output height?1)?Sh+Fh?H,0)

padding along width=Pw=max((output width?1)?Sw+Fw?W,0)

padding top=Pt=Floor(Ph/2)

padding left=Pl=Floor(Pw/2)

padding bottom=Ph?Pt

padding right=Pw?Pl

Similarly, in the 'VALID' padding scheme which we do not add any zero padding to the input, the size of the output would be

output height=ceil(H?Fh+1/Sh)

output width=ceil(W?Fw+1Sw)output height=ceil(H?Fh+1Sh)output width=ceil(W?Fw+1/Sw)

Let's get back to the Convolutional layer. A convolution layer does exactly the same: applying a filter on an input in convolutionl manner. Likewise Fully-Connected layers, a Convolutional layer has a weight, which is its kernel (filter), and a bias. But in contrast to the fully-connected layers, in convolutional layers each pixel (or neuron) of the output is connected to the input pixels (neurons) locally instead of being connected to all input pixels (neurons). Hence, we use the term of receptive field for the size of convolutional layer's filter.

Bias in a convolutional layer is a unique scalar value which is added to the output of Convolutional Layer's filter at every single pixel. What we talked about so far, was in fact a Convolutional layer with 1 input and 1 output channel (also known as depth) and a zero bias. Generally, a convolution layer can have multiple input channels (each a 2D matrix) and multiple output channels (again each a 2D matrix). Maybe the most tangible example of a multi-channel input is when you have a color image which has 3 RGB channels. Let's get it to a convolution layer with 3 input channels and 1 output channel. How is it going to cacluate the output? A short answer is that it has 3 filters (one for each input) instead of one input. What it does is that it calculates the convolution of each filter with its corresponding input channel (First filter with first channel, second filter with second channel and so on). The stride of all channels are the same, so they output matrices with the same size. Now, it sum up all matrices and output a single matrix which is the only channel at the output of the convolution layer.

What about when the convolution layer has more than one output channels. In that case, the layer has a different multi-channel filter (the number of its channel is equal to the number of input channels) to calculate each output. For example, assume we have a layer with three input channels (RGB) and five output channels. This layer would have 5 filters, and 3 channels per filter. It uses each filter (3 channels) to compute the corresponding output from the input channels. In other words, it uses the first 3-channel filter to calculate the first channel of the output and so on. Note that each output channel has its own bias. Therefore, the number of biases in each Convolutional layer is equal to the number of output channels. Now, let's modify the previous code to handle more than one channel at output.

number of parameters=(Fw×Fh×di+1)×do

where didi, and dodo are depth (# of channels) of the input and depth of the output, respectively. Note that the one inside the parenthesis is to count the biases.

  • Convolution layers with 1X1 filter size: Even though using a 1X1 filter does not make sense at first glance in image processing point of view, it can help by adding nonlinearity to your network. In fact, a 1X1 filter calculate a linear combination of all corresponding pixels (nuerons) of the input channels and output the result through an activation function which adds up the nonlinearity.

3.

Convolutional Neural Networks (CNN, or ConvNets)

Convolutional Neural networks allow computers to see, in other words, Convnets are used to recognize images by transforming the original image through layers to a class scores. CNN was inspired by the visual cortex. Every time we see something, a series of layers of neurons gets activated, and each layer will detect a set of features such as lines, edges. The high level of layers will detect more complex features in order to recognize what we saw.

Input (the training data):

  • The input layer or input volume is an image that has the following dimensions: [width x height x depth].It is a matrix of pixel values.
  • Example: Input: [32x32x3]=>(width=32, height=32, depth=3)The depth here, represents R,G,B channels.
  • The input layer should be divisible many times by 2 . Common numbers include 32, 64, 96, 224, 384, and 512.

A part of the image is connected to the next Conv layer because if all the pixels of the input is connected to the Conv layer, It will be too computationally expensive. So we are going to apply dot products between a receptive field and a filter on all the dimensions. The outcome of this operation is a single integer of the output volume (feature map). Then we slide the filter over the next receptive field of the same input image by a

  • W2=(W1?F+2P)/S+1
  • H2=(H1?F+2P)/S+1
  • D2=K
  • [W1xH1xD1] : input volume size
  • F: receptive field size
  • S: stride
  • P: amount of zero padding used on the border.
  • K: depth

Parameter Sharing (shared weights): We think that if a feature is useful it will also be useful to look for it everywhere in the image. However, sometimes, it is weird to share the same weights in some cases. For example, in a training data that contains faces centered, we don’t have to look for eyes in the bottom or the top of the picture.

Dilation is a new hyperparameter introduced to the Conv layer. dilation is filters with spaces between its cells. for example, we have one dimension filter W of size 3 and an input X:

  • Dilation of 0: w[0]*x[0] + w[1]*x[1] + w[2]*x[2].
  • Dilation of 1: w[0]*x[0] + w[1]*x[2] + w[2]*x[4].

POOL layer:

Pool Layer performs a function to reduce the spatial dimensions of the input, and the computational complexity of our model. And it also controls overfitting. It operates independently on every depth slice of the input. There are different functions such as Max pooling, average pooling, or L2-norm pooling. However, Max pooling is the most used type of pooling which only takes the most important part (the value of the brightest pixel) of the input volume.

Fully_Connected Layer (FC):

Fully connected layers connect every neuron in one layer to every neuron in another layer. The last fully-connected layer uses a softmax activation function for classifying the generated features of the input image into various classes based on the training dataset.

Pool layer doesn’t have parameters (the weights and biases of the neurons), and no zero padding, but it has two hyperparameters: Filter (F) and Stride (S). More generally, having the input W1×H1×D1, the pooling layer produces a volume of size W2×H2×D2 where:

  • W2=(W1?F)/S+1
  • H2=(H1?F)/S+1
  • D2=D1

Related Solutions

2. Find the volume of revolution by WASHER: ? = 2?^(1/2) ??? ? = x
2. Find the volume of revolution by WASHER: ? = 2?^(1/2) ??? ? = x
Find the volume of the parallelepiped with adjacent edges PQ, PR, PS. P(3, 0, 2),    Q(−1, 2,...
Find the volume of the parallelepiped with adjacent edges PQ, PR, PS. P(3, 0, 2),    Q(−1, 2, 7),    R(4, 2, −1),    S(0, 5, 3) Cubic units If a = (2, −1, 5) and b = (4, 2, 1), find the following. a × b = b × a= If a = i − 5k and b = j + k, find a × b
?⃗ = (2?)?̂− (3?)?̂ ?⃗⃗ = (1?)?̂− (2?)?̂ Find a) ?⃗ − 2?⃗⃗ b) ?⃗ ∙...
?⃗ = (2?)?̂− (3?)?̂ ?⃗⃗ = (1?)?̂− (2?)?̂ Find a) ?⃗ − 2?⃗⃗ b) ?⃗ ∙ ?⃗⃗ c) ?⃗ × ?⃗⃗ d) Angle between ?⃗ and ?⃗⃗
You are conducting a geotechnical analysis on a 3-layer soil system. Layer 1 is a 5-ft...
You are conducting a geotechnical analysis on a 3-layer soil system. Layer 1 is a 5-ft sand with dry unit weight of 100 pcf. Layer 2 is a 6-ft silty sand with a saturated unit weight of 115 pcf. Layer 3 is a 15-ft low plasticity clay with a saturated unit weight of 117 pcf. There is an unknown granular soil below the clay layer. The ground water table is at a depth of 5 ft. A structural footing with...
A Mystery Algorithm Input: An integer n ≥ 1 Output: ?? Find P such that 2...
A Mystery Algorithm Input: An integer n ≥ 1 Output: ?? Find P such that 2 P is the largest power of two less than or equal to n. Create a 1-dimensional table with P +1 columns. The leftmost entry is the Pth column and the rightmost entry is the 0th column. Repeat until P < 0 If 2 P ≤ n then put 1 into column P set n := n − 2 P Else put 0 into column...
y=3x, y=3x^2 1.find volume around x=5 with washer method 2.find volume around y-axis with shell method
y=3x, y=3x^2 1.find volume around x=5 with washer method 2.find volume around y-axis with shell method
The volume of states 1, 2 and 3 of a diesel cycle are 157, 13 and...
The volume of states 1, 2 and 3 of a diesel cycle are 157, 13 and 23 fr^3 respectively. Determine the hp of the engine if the heat input is 550 BTU/min
Given the vectors u1 = (2, −1, 3) and u2 = (1, 2, 2) find a...
Given the vectors u1 = (2, −1, 3) and u2 = (1, 2, 2) find a third vector u3 in R3 such that (a) {u1, u2, u3} spans R3 (b) {u1, u2, u3} does not span R3
In the lab manual you are required to keep a 2 – 3 mm layer of...
In the lab manual you are required to keep a 2 – 3 mm layer of acetone and then add water. A student vaporized the acetone absolutely and state that he can use the residue for the later measurement. What is the biggest defect of doing this comparing to the correct procedure? How the calculated yield and measured boiling range will be affected? Experiment 4: Extraction of an Antibiotic Introduction The extraction of compounds from plant and animal sources is...
Find the net change of ?(?) = (? − 2)(? + 3) on the interval [-1,...
Find the net change of ?(?) = (? − 2)(? + 3) on the interval [-1, 1]. Find the net change of ?(?) = 6 cos ? on the interval [−? /2, ?/2]. Find the total area between ?(?) = ?3 − 2?2 and the x-axis on the interval [0, 5]. A ball is thrown upward from a height of 5 feet at an initial speed of 72 ft/sec. Acceleration resulting from gravity is -32 ft/sec2. Neglecting air resistance, find...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT