In: Computer Science
Calculate the input and output dimension for each convolution and pooling layer in DeepSEA (100pts)
Input Output dimensions in Convultion-
Input Shape- Give a 4D array as input to the CNN. So input data has a shape of (batch_size, height, width, depth), where the first dimension represents the batch size of the image and the other three dimensions represent dimensions of the image which are height, width, and depth. For some of you who are wondering what is the depth of the image, it’s nothing but the number of color channels. For example, an RGB image would have a depth of 3, and the greyscale image would have a depth of 1.
Output Shape- The output of the CNN is also a 4D array. Where batch size would be the same as input batch size but the other 3 dimensions of the image might change depending upon the values of filter, kernel size, and padding we use.
Let’s look at the following code snippet
Input_shape argument looks like out input shape is 3D, but you have to pass a 4D array at the time of fitting the data which should be like (batch_size, 10, 10, 3). Since there is no batch size value in the input_shape argument, we could go with any batch size while fitting the data.
As you can notice the output shape is (None, 10, 10, 64). The first dimension represents the batch size, which is None at the moment. Because the network does not know the batch size in advance. Once you fit the data, None would be replaced by the batch size you give while fitting the data.
Let’s look at another code snippet.
Here we
have replaced input_shape argument with batch_input_shape. As the name suggests, this argument will ask you the batch size in advance, and you can not provide any other batch size at the time of fitting the data. For example, you have to fit the data in the batch of 16 to the network only.
Now you can see that output shape also has a batch size of 16 instead of None.
----------------------------------------------------------------------------------------------------------------------------------------------------------
Input and Output dimensions for Pooling Layer-
The result of using a pooling layer and creating down sampled or pooled feature maps is a summarized version of the features detected in the input. They are useful as small changes in the location of the feature in the input detected by the convolutional layer will result in a pooled feature map with the feature in the same location. This capability added by pooling is called the model’s invariance to local translation.
In all cases, pooling helps to make the representation become approximately invariant to small translations of the input. Invariance to translation means that if we translate the input by a small amount, the values of most of the pooled outputs do not change.
Example with pooling-
# example of average pooling
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import AveragePooling2D
# define input data
data = [[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)
# create model
model = Sequential()
model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))
model.add(AveragePooling2D())
# summarize model
model.summary()
# define a vertical line detector
detector = [[[[0]],[[1]],[[0]]],
[[[0]],[[1]],[[0]]],
[[[0]],[[1]],[[0]]]]
weights = [asarray(detector), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)
# apply filter to input data
yhat = model.predict(data)
# enumerate rows
for r in range(yhat.shape[1]):
# print each column in the row
print([yhat[0,r,c,0] for c in range(yhat.shape[2])])
Running the example first summarizes the model.
We can see from the model summary that the input to the pooling layer will be a single feature map with the shape (6,6) and that the output of the average pooling layer will be a single feature map with each dimension halved, with the shape (3,3).