In: Other
You're helping some security analysts monitor a collection of networked computers, tracking the spread of an online virus. There are n computers in the system. labeled \(01, C 2, \ldots, \mathrm{Cn}\), and as input you're given a collection of trace data indicating the times at which pairs of computers communicated. Thus the data is a sequence of ordered triples \((\mathrm{Cl}, C j, t k)\), such a triple indicates that \(C /\) and \(C j\) exchanged bits at time \(t k\). There are \(m\) triples total
We'll assume that the triples are presented to you in sorted order of time For purposes of simplicity, we'll assume that each pair of computers communicates at most once during the interval you're observing.
The security analysts you're working with would like to be able to answer questions of the following form: If the virus was inserted into computer Ca at time \(x\), could it possibly have infected computer \(\mathrm{Cb}\) by time \(y\) ? The mechanics of infection are simple: if an infected computer Ci communicates with an uninfected computer \(\mathrm{Cj}\) at time tk (in other words, if one of the triples \((\mathrm{Ci}, \mathrm{Cj}, t k)\) or \((\mathrm{C} j, \mathrm{Cl}, t k)\) appears in the trace data), then computer \(C j\) becomes infected as well, starting at time tk. Infection can thus spread from one machine to another across a sequence of communications, provided that no step in this sequence involves a move backward in time. Thus, for example, if \(C i\) is infected by time \(t k\), and the trace data contains triples \((C i, C j\) \(t k)\) and \((C j, C q, t r)\), where \(t k \leq t r\), then Cq will become infected via \(C\). (Note that it is okay for \(t k\) to be equal to tr ; this would mean that Cj had open connections to both \(C\) i and \(C q\) at the same time, and so a virus could move from \(\mathrm{C}\) i to \(\mathrm{Cq}\).)
For example, suppose \(n=4\), the trace data consists of the triples
\((\mathrm{C} 1, \mathrm{C} 2,4),(\mathrm{C} 2, \mathrm{C} 4,8),(\mathrm{C} 3, \mathrm{C} 4,8),(\mathrm{C} 1, \mathrm{C} 4,12)\)
and the virus was inserted into computer \(\mathrm{C} 1\) at time 2 . Then \(\mathrm{C} 3\) would be infected at time 8 by a sequence of three steps: first \(C 2\) becomes infected at time 4 , then \(C 4\) gets the virus from \(C 2\) at time 8 , and then \(\mathrm{C} 3\) gets the virus from \(\mathrm{C} 4\) at time 8 . On the other hand, if the trace data were
\((C 2, C 3,8),(C 1, C 4,12),(C 1, C 2,14)\)
and again the virus was inserted into computer \(\mathrm{C} 1\) at time 2 , then \(\mathrm{C} 3\) would not become infected during the period of observation although \(C 2\) becomes infected at time 14 , we see that C3 only communicates with \(C 2\) before \(C 2\) was infected There is no sequence of communications moving forward in time by which the virus could get from \(\mathrm{C}1\) to \(\mathrm{C} 3\) in this second example.
Design an algorithm that answers questions of this type given a collection of trace data, the aigorithm should decide whether a virus introduced at computer Ca at time x could have aeter comouter Cb by time \(y\). The algorithm should run in time \(O(m+n)\)
Answer: This trace data can be represented by an undirected graph in which computers appearing in trace data will be vertices of graph(C1,C2............Ci......Cj........), while edges between any two vertices Ci and Cj will represent the time tk at which they communicated.
This particular problem can be solved by finding a possible path (i.e. a connected component) between two given competers and checking whether they are linked via intermediate computers in the order of increasing times. If it is so, then possibly second computer may be infected by the first computer provided first computer is already infected. And this can be found by performing a BFS w.r.t. each vertex on the concerned graph.
Let C be set of vertices that are reachable from a given vertex s in graph. A formal algorithm can be applied as:
-----------------------------------------------------
For all vertices s in graph
Let C to store all nodes reachable (path) from s
Initialize: C = (s)
While there is an edge (u,v) where u is in C, but v is not in C
Add v to C
End While
For all vertices in C except s
If s already infected
if tk (time for some u connected to s) >= t(at which s become infected)
Flag u as infected
Else If tk (time for some u connected to s) <= tr (time for some v connected to u)
Flag v as infected provided u already marked as infected
End If
End If
End For
End For
-----------------------------------------------