In: Computer Science
You are going to secure messages sent between A and B,
confidentially is not an issue but
integrity. The number of messages will be high so the algorithm
needs to be computationally
efficient. Describe a solution for this including algorithms to be
used and management of
keys.
Kindly answer this question in the field of Applied Computer
Security
computationally efficient depends upon multiple factors:
Performance Evaluation
Performance under Local Traffic
Image 1 shows the average message latency versus normalized
accepted traffic when messages are sent locally. In this case,
message destinations are uniformly distributed inside a cube
centered at the source node with each side equal to four
channels.
The partially adaptive algorithm doubles throughput with respect to the
deterministic one when messages are sent locally. The fully
adaptive algorithm performs even better, reaching a throughput
three times higher than the deterministic algorithm and 50% higher
than the partially adaptive algorithm. Latency is also smaller for
the full range of accepted traffic.
When locality increases, even more, the benefits of using
adaptive algorithms are smaller because the distance between the
source and destination nodes is short, and the number of
alternative paths is much smaller.
Image 2 the average message latency versus normalized accepted
traffic when messages are uniformly distributed inside a cube
centered at the source node with each side equal to two channels.
In this case, partially and fully adaptive algorithms perform
almost the same. All the improvement with respect to the
deterministic algorithm comes from better utilization of virtual
channels.
Collective Communication Support
Multiaddress Encoding Schemes
The header of multi-destination messages must carry the addresses
of the destination nodes. The header information is an overhead to
the system, increasing message latency and reducing the effective
network bandwidth. A good multiaddress encoding scheme should
minimize the message header length, also reducing the header
processing time.
In wormhole and VCT switching, the routing algorithm is executed before the whole message arrives at the router. As the header may require several flits to encode the destination addresses, it is desirable that the routing decision in each router could be made as soon as possible to reduce message latency. Ideally, a message header should be processed on the fly as header flits arrive. When the number of destination addresses is variable, it is inefficient to use a counter to indicate the number of destinations. Such a counter should be placed at the beginning of a message header. Since the value of the counter may be changed at a router if the destination set is split into several subsets, it would prevent the processing of message headers on the fly. An alternative approach is to have an end-of-header (EOH) flit to indicate the end of a header. Another approach consists of using 1 bit in each flit to distinguish between header and data flits.
Message Switching Layer
Pipelined Circuit Switching
In many environments, rather than minimizing message latency or
maximizing network throughput, the overriding issue is the ability
to tolerate the failure of network components such as routers and
links. In wormhole switching, header flits containing routing
information establish a path through the network from source to
destination. Data flits are pipelined through the path immediately
following the header flits. If the header cannot progress due to a
faulty component, the message is blocked in place indefinitely,
holding buffer resources and blocking other messages. This
situation can eventually result in a deadlocked configuration of
messages. While techniques such as adaptive routing can alleviate
the problem, it cannot by itself solve the problem. This has
motivated the development of different switching techniques.
Encrytion / decryption efficient time algorithm
Algorithm Key size(s) Speed Speed depends on key size? Security / comments
RC4 40-1024
Very fast No Of
questionable security; maybe secure for moderate numbers of
encrypted sessions of moderate length. RC4 has the redeeming
feature of being fast. However, it has various weaknesses in the
random number sequence that it uses: see Klein (2008)1.
Blowfish 128-448
Fast
No
Believed secure, but with less attempted cryptanalysis than other
algorithms. Attempts to cryptanalysis Blowfish soon after
publication are promising (Schneier, 19952 & 19963). But,
unlike AES, it doesn't appear to have received much attention
recently in the cryptographic literature. Blowfish has been
superseded by Twofish, but the latter is not supported as standard
in Java (at least, not in Sun's JDK).
AES 128, 192, 256
Fast
Yes
Secure, though with some reservations from the crypto community. It
has the advantage of allowing a 256-bit key size, which should
protect against certain future attacks (collision attacks and
potential quantum computing algorithms) that would have 264
complexity with a 128-bit key and could become viable in the
lifetime of your data.
DES 56
Slow
–
Insecure: A $10,000 Copacobana machine can find a DES key in an
average of a week, as (probably) could a botnet with thousands of
machines.
The simple answer is: "Don't use it– it's not safe". (RFC 4772).
Triple DES 112/168, but equivalent security of
80/112 Very slow
No
Moderately secure, especially for small data sizes. The 168-bit
variant estimated by NIST (2006) to keep data secure until
20304.
Triple DES performs three DES operations (encrypt-decrypt-encrypt), using either two or three different keys. The 168-bit (three-key) variant of Triple-DES is generally considered to offer "112 bits of security", due to a so-called meet-in-the-middle attack. AES offers a higher level of security for lower CPU cost.
Performance
Network Design Considerations
Interconnection networks play a major role in the performance of
modern parallel computers. There are many factors that may affect
the choice of an appropriate interconnection network for the
underlying parallel computer. These factors include the
following:
1.
Performance requirements. Processes executing in different
processors synchronize and communicate through the interconnection
network. These operations are usually performed by explicit message
passing or by accessing shared variables. Message latency is the
time elapsed between the time a message is generated at its source
node and the time the message is delivered at its destination node.
Message latency directly affects processor idle time and memory
access time to remote memory locations. Also, the network may
saturate–it may be unable to deliver the flow of messages injected
by the nodes, limiting the effective computing power of a parallel
computer. The maximum amount of information delivered by the
network per time unit defines the throughput of that network.
2.
Scalability. A scalable architecture implies that as more
processors are added, their memory bandwidth, I/O bandwidth, and
network bandwidth should increase proportionally. Otherwise the
components whose bandwidth does not scale may become a bottleneck
for the rest of the system, decreasing the overall efficiency
accordingly.
3.
Incremental expandability. Customers are unlikely to purchase a
parallel computer with a full set of processors and memories. As
the budget permits, more processors and memories may be added until
a system's maximum configuration is reached. In some
interconnection networks, the number of processors must be a power
of 2, which makes them difficult to expand. In other cases,
expandability is provided at the cost of wasting resources. For
example, a network designed for a maximum size of 1,024 nodes may
contain many unused communication links when the network is
implemented with a smaller size. Interconnection networks should
provide incremental expandability, allowing the addition of a small
number of nodes while minimizing resource wasting.
4.
Partitionability. Parallel computers are usually shared by several
users at a time. In this case, it is desirable that the network
traffic produced by each user does not affect the performance of
other applications. This can be ensured if the network can be
partitioned into smaller functional subsystems. Partitionability
may also be required for security reasons.
5.
Simplicity. Simple designs often lead to higher clock frequencies
and may achieve higher performance. Additionally, customers
appreciate networks that are easy to understand because it is
easier to exploit their performance.
6.
Distance span. This factor may lead to very different
implementations. In multicomputers and DSMs, the network is
assembled inside a few cabinets. The maximum distance between nodes
is small. As a consequence, signals are usually transmitted using
copper wires. These wires can be arranged regularly, reducing the
computer size and wire length. In NOWs, links have very different
lengths and some links may be very long, producing problems such as
coupling, electromagnetic noise, and heavy link cables. The use of
optical links solves these problems, equalizing the bandwidth of
short and long links up to a much greater distance than when copper
wire is used. Also, geographical constraints may impose the use of
irregular connection patterns between nodes, making distributed
control more difficult to implement.
7.
Physical constraints. An interconnection network connects
processors, memories, and/or I/O devices. It is desirable for a
network to accommodate a large number of components while
maintaining a low communication latency. As the number of
components increases, the number of wires needed to interconnect
them also increases. Packaging these components together usually
requires meeting certain physical constraints, such as operating
temperature control, wiring length limitation, and space
limitation. Two major implementation problems in large networks are
the arrangement of wires in a limited area and the number of pins
per chip (or board) dedicated to communication channels. In other
words, the complexity of the connection is limited by the maximum
wire density possible and by the maximum pin count. The speed at
which a machine can run is limited by the wire lengths, and the
majority of the power consumed by the system is used to drive the
wires. This is an important and challenging issue to be considered.
Different engineering technologies for packaging, wiring, and
maintenance should be considered.
8.
Reliability and repairability. An interconnection network should be
able to deliver information reliably. Interconnection networks can
be designed for continuous operation in the presence of a limited
number of faults. These networks are able to send messages through
alternative paths when some faults are detected. In addition to
reliability, interconnection networks should have a modular design,
allowing hot upgrades and repairs. Nodes can also fail or be
removed from the network. In particular, a node can be powered off
in a network of workstations. Thus, NOWs usually require some
reconfiguration algorithm for the automatic reconfiguration of the
network when a node is powered on or off.
9.
Expected workloads. Users of a general-purpose machine may have
very different requirements. If the kind of applications that will
be executed in the parallel computer are known in advance, it may
be possible to extract some information on usual communication
patterns, message sizes, network load, and so on. That information
can be used for the optimization of some design parameters. When it
is not possible to get information on expected workloads, network
design should be robust; that is, design parameters should be
selected in such a way that performance is good over a wide range
of traffic conditions.
10.
Cost constraints. Finally, it is obvious that the “best” network
may be too expensive. Design decisions often are trade-offs between
cost and other design factors. Fortunately, cost is not always
directly proportional to performance. Using commodity components
whenever possible may considerably reduce the overall cost.