In: Computer Science
Instructions:
Question 1:
A directed graph G consists of • vertices (V ) (also called nodes) • edges (E) (also called links). An edge can have extra data. The number of nodes |V | and the number of edges |E| are denoted by n and m respectively. An undirected graph is like a directed graph except the edges do not have direction. Some more definitions: • An undirected graph is connected if there exists a path between all u, v ∈ V . We can extend this to directed graphs. • A cycle is a path from a node to itself. • An undirected graph is called a tree if it is connected and acyclic. We can choose a node to be the root and thus have a rooted tree. We discussed trees in the module. • We call a graph dense if m ∈ (n 2 ) and sparse of m ∈ O(n). A representation of graphs must allow the following operations: • Get(i, j): checks if there is an edge from i to j. It also returns the extra data on this edge. • Add(i, j, w) adds an edge from i to j (or between i and j), together with extra data w. • AllFrom(i) returns a list of the edges with source i. There are three popular way of representing (and storing) graph data: Adjacency matrix, adjacency list and edge list.
Part a: Where the entry at row i and column j specifies whether there is an edge from i to j (with the weight or extra data on that edge). Argue about the time complexity (in Θ notations) of Get, Add, and AllFrom in this representation. (Adjacency Matrix)
Part b: For each i associates a list of nodes j (with extra data) such that there is an edge from i to j. Argue about the time complexity (in Θ notations) of Get, Add, and AllFrom in this representation. (Adjacency List)
Part c: Each edge (i, j) associates a list of nodes j (with extra data) such that there is an edge from i to j. Argue about the time complexity (in Θ notations) of Get, Add, and AllFrom in this representation. (Edge List)
Part d: There are a lot of network data libraries on web. See for example Konect that hosts a large array of networks with different types. Choose two undirected graphs, I suggest one small (n < 1000) if you want to visualize it and one large (arbitrary) and read them using your code. Write a python script that can read the chosen network data then cast the data into a Graph class using NetworkX or igraph, two useful python libraries to build your graph object (and visualize them if you want). Based on your search on network libraries, what is the common way of storing network data? Why? Discuss the trade-offs of using the abovementioned representations in sparse and dense graphs. (Comparison and implementation)
Part(a) Adjacency Matrix representation:
An adjacency matrix is stored as a two-dimensional array. Since we can access each element of an array in O(1) time because all elements are stored contigously in memory and we already know the base address, therefore we can get the address and hence access any element in the array in a constant time using array indexing.
Get(i,j) : For this we just have to write A[i][j] (assuuming A is the 2-d array representing the adjacency matrix). A[i][j] will contain the weight (if there is an edge). Hence it takes O(1) time only.
Add(i,j,w) : To add an edge from i to j with weight w, just asssign A[i][j] = w. Hence this also takes O(1) time.
AllFrom(i): To get the list of the edges starting from i, we have to scan the row corresponding to the node i. Since there are n nodes, there are n rows and n columns (A is of size n*n). While scanning all the elements in a row, we check if the entry is positive. If positive, it means edge is present and we add it to the list, if negative or zero, we discard and dont add to the list. Hence the time complexity for AllFrom(i) is O(n).
Part(b) Adjacency List representation: For each node i, we maintain a list (linked list) which contains all the nodes which have an edge starting from i. So in total, the graph is stored as an array of linked lists.
Get(i,j): To check if there is an edge from i to j, we have to scan the list corresponding to the node i. Now in the worst case, all the nodes can be adjacent to node i, in which case we have to check all the n nodes in the list to check if j is present. Hence, time complexity for this is O(n)
Add(i,j,w): To add an edge from i to j, we have to insert the node j and the weight of edge - w in the list corresponding to i. Since in a linked list, we maintain a pointer, which either points to starting of list, or ending or both in some cases, therefore we can add the node j in the list just by creating a new node at the end or front wherever the pointer is. This takes O(1) time only.
AllFrom(i): To get the list of the edges starting from i, we have to scan the list corresponding to the node i. Now in the worst case, all the nodes can be adjacent to node i, in which case time complexity for this is O(n).
Part(c) Edge List: In edge list representation, we have a list (array) of edges, where each edge is a tuple or an element (i,j,w).
Get(i,j): To know if there is an edge from i to j, we have no option but to check the entire edgelist for that edge. Hence in worst case this takes O(m) time. where m is the total no of edges.
Note that if it is a special implementation of edge list, like if we use some ordering (ascending or descending based on first source node number) to store the edges, then we can use binary search to query, which will take O(log m) time.
Add(i,j,w): To add an edge from i to j with weight w, we can just insert the edge at the last or front of the list, hence it will take O(1) time only.
AllFrom(i): To get the list of the edges starting from i, we have to scan the entire edgelist corresponding to the source node i. Hence, time complexity for this is O(m).