Graphs - Intro

# CSCI-UA 102
## Data Structures

## Graphs - Introduction

.license[
Copyright 2020 Joanna Klukowska. Unless noted otherwise all content is released under a 
[Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/).]

---
layout:true
template: default
name: section
class: inverse, middle, center

---
layout:true
template: default
name: breakout
class: breakout, middle

---

layout:true
template:default
name:slide
class: slide

---

# Graphs 
## Terminology and Definitions

---
## Graphs: terminology and definitions

- Graphs consist of 
 - _nodes_ / _vertices_, the nodes on the right are: A, B, C, D, E, F, G, H, J
 - _edges_ , these are the connections between the nodes; there are eleven edges on the right: A-B, A-F, A-H, B-C, B-E, C-D, C-J, D-G, D-F, E-D, F-H (they are listed in no particular order)

--
- A _path in a graph_ is a sequence of edges that leads from one node to another. In the graph on the right there are several paths from A to G. One of them is A-B-E-D-G. 
- The _length of a path_ is the number of edges in it. The path mentioned above has length 4. 
- A _cycle_ is a path in which the first and last nodes are the same. For example, A-B-E-D-F-A is a cycle.

- Two nodes are _adjacent_ (or are _neighbors_) if there is an edge between them.

---
## Graphs: terminology and definitions

.right-column2[.center[
<img width="350px" alt="undirected connected graph" src="../img/15/graphs-1.png">

<img width="350px" alt="undirected disconnected graph" src="../img/15/graphs-5.png">
]]

- A graph is _connected_ if there exists a path between every pair of nodes, like the graph on the upper right.

--
- A graph is _disconnected_ if it is NOT connected, like the graph on the lower right. 
- The parts of a disconnected graph are called its _connected components_.

--
- A _tree_ is a connected graph with no cycles, like the graph below. 
<img width="350px" alt="undirected connected graph" src="../img/15/graphs-6.png">

---
## Graphs: terminology and definitions

- A _directed graph_ contains edges that can be traversed in one direction only.

--
 - there is an edge from H to A, but there is no edge from A to H
 
--
- the path from B to D is B->E->D, but the path from D to B is D->J->C->B

- If an edge exists in both directions, it is marked by two directed edges, like B->C and C->B, (or by an arrow that points in both directions, like B<->C)

---
## Graphs: terminology and definitions

<img width="350px" alt="directed graph" src="../img/15/graphs-4.png">
]]

- A _weighted graph_ contains edges that have _weights_ assigned to them.

- The graph on the upper right is an _undirected weighted graph_.

- The graph on the lower right is a _directed weighted graph_.

---

# Graph Representations

---
name:adj-list

## Adjacency List

.right-column2[
```
    class Node {
        label/data 
        list of nodes adjacent to it
    }
```
]

- Each node is assigned an _adjacency list_ of nodes that are adjacent to it.

- The adjacency list can be stored as a linked list or an array.

- The graph itself is represented as a list of nodes.

---
template:adj-list

An adjacency list representation for the graph on the right could be 
a list as follows:

```
    A, [B, F, H]
    B, [A, C, E]
    C, [B, D, J]
    D, [C, E, F, G]
    E, [B, D]
    F, [A, D, H]
    G, [D]
    H, [A, F]
    J, [C]
```

(each row represents a single node).

Note that in an undirected graph, if a node is listed in the adjacency list of another node, then 
the symmetric case has to be true as well (since the adjacency relation is symmetric. For example, if E is adjacent to D, then D is adjacent to E.

---
template:adj-list

In a directed graph, there is no symmetry.

An adjacency list representation for the graph on the right could be 
a list as follows:

```
    A, [B]
    B, [A, C, E]
    C, [B, D]
    D, [J]
    E, [D, F]
    F, [E]
    G, [D]
    H, [A, F]
    J, [C]
```

(each row represents a single node).

---
name:adjacency_matrix

## Adjacency Matrix

- _Adjacency matrix_ is a matrix indicating the edges of the graph.

- It is typically stored as a 2D array in which `matrix[a][b]` indicates
if there is an edge between nodes `a` and `b`
  - for undirected graphs, `matrix[a][b]=matrix[b][a]`
  - for unweighted graphs, the matrix can store binary information
  - for weighted graphs, the matrix values indicate the weights of the edge
- drawback: uses $n^2$ elements and most of them may be zero (indicating no edges)

---
template:adjacency_matrix

|       | A | B | C | D | E | F | G | H | J | 
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| **A** | 0 | **1** | 0 | 0 | 0 | **1** | 0 | **1** | 0 |
| **B** | **1** | 0 | **1** | 1 | **1** | 0 | 0 | 0 | 0 |
| **C** | 0 | **1** | 0 | **1** | 0 | 0 | 0 | 0 | **1** |
| **D** | 0 | **1** | 1 | 0 | **1** | 1 | **1** | 0 | 0 |
| **E** | 0 | **1** | 0 | **1** | 0 | 0 | 0 | 0 | 0 |
| **F** | **1** | 0 | 0 | **1** | 0 | 0 | 0 | **1** | 0 |
| **G** | 0 | 0 | 0 | **1** | 0 | 0 | 0 | 0 | 0 |
| **H** | **1** | 0 | 0 | 0 | 0 | **1** | 0 | 0 | 0 |
| **J** | 0 | 0 | **1** | 0 | 0 | 0 | 0 | 0 | 0 |

]
---
name:edge_list
## Edge List

- An _edge list_ is a list that contains all edges in the graph in some order.

- It is convenient for use in algorithms that need to traverse all the edges and it is not
necessary to find edges that start/end in a particular node.

- Elements in an _edge list_ are pairs of nodes indicating the end-points for each edge.

---
template:edge_list

```
    A->B
    B->A
    B->C
    B->E
    C->B
    C->D
    D->J
    E->D
    E->F
    F->E
    G->D
    H->A
    H->F
    J->C
```

]

---
## Implicit Graph

- _Implicit graphs_ are graphs that are not stored using graphs data structures (like the ones in the previous slides).

- They are used when the edge connectivity can be determined easily and/or if they can be determined using some rules. For example,

- A graph contains N nodes labeled [0,N-1]. There is an edge between 
 two nodes if a sum of their labels is even. 
 For example: nodes with labels 3 and 57 have an edge between them, but nodes with labels 3 and 56 do not. 
 
--
 
 - A 2D grid represents a labirinth. The cells with passable corridors 
 are marked with '.'. The cells with solid walls are marked with '#'. 
 Each of those cells is an _implicit_ node
 in a graph. There is an _implicit_ edge between two nodes if the two cells are connected
 along one of the four sides and they are both marked with a '.'.
 
 <pre>
 ####.##.##
 #....#..##
 #.####.#.#
 #......#.#
 #####....#
 ##########
 </pre>
---

# Graph Traversals

---

## Traversing a Graph

Task: given a starting node in a graph, visit all nodes that can be reached from that node.

- depth first search traversal (DFS)

- breadth first search traversal (BFS)

---
name:dfs
## Depth First Search Traversal

---
template:dfs
- follows a single path through the graph as long as there are un-visited nodes
- after it cannot find any more nodes to visit, it _returns_ to previous nodes and
follows unexplored paths from them
- needs to keep track of visited nodes

---
template:dfs

What order will the DFS traversal visit the nodes in the graph on the right when we start DFS at A?

There are more than one possibilities. The exact order is determine by how the graph is stored and how we access the nodes adjacent to 
a given node.

Let's look at some possible traversals

---
template:dfs

- start with A (visited: A)

--
- we have three choices for the next node: B, F, or H; let's pick F (visited: A, F)

--
- from F, we can either go to D or H; let's pick H (visited A,F,H)

--
- at H, the only choices are A and F and they are both visited, so we go back to F and 
    look for other options; the only option is D (visited: A,F,H,D)

--
- from D, we can got to E, G, or C (since F has been visited before); let's  pick G
    (visited: A,F,H,D,G)

--
- G has no un-visited neightbors, so we go back to D; let's pick E (visited: A,F,H,D,G,E)

--
- E's only un-visited neighbor is B, so we go there;  (visited: A,F,H,D,G,E,B)

--
- B's unvisited neighbor is C, so we go there;  (visited: A,F,H,D,G,E,B,C)

--
- C's unvisited neighbor is J, so we go there;  (visited: A,F,H,D,G,E,B,C,J)

At this point we have visited all the nodes and our traversal is:
 
A, F, H, D, G, E, B, C, J

But an algorihtm would need to verify that all the nodes have been visited.

---

Here are two other DFS traversals starting at node A:

- A, B, E, D, F, H, G, C, J

- A, B, C, J, D, E, F, H, G

__Task__

List all possible traversals starting at node with label J.

---
template:dfs

An algorithm using the adjacency list, `adj`, of a graph with 
`n` nodes and `m` edges, `O(n+m)`

```
  visited[N] - boolean array with all values set initially to false

dfs ( start )
      if visited[start] return   //already processed this node

visit/process the node 
      
      visited[start] = true      //mark node as visited

for node in adj[start]
          dfs( node )

```
]
---
name:bfs
## Breadth First Search Traversal

---
template:bfs

- visits the nodes in order of their distance from a starting node
(distance = path length between nodes)
  - visit all nodes whose distance from the start node is 1
  - visit all nodes whose distance from the start node is 2
  - ...

What order will the BFS traversal visit the nodes in the graph on the right when we start BFS at A?

Again, the exact ordering will depend on how we store the graph and how we access nodes adjacent to a given node.

Here are possible traversals for the graph 
on the right.

- A, B, F, H, C, E, D, G, J

__Question__ Which of the values in the above traversal could be moved and to where
so that it is still a BFS traversal?

__Task__ Propose a BFS traversal of this graph starting at node J.

---
template:bfs

An algorithm using the adjacency list `adj` of a graph with `n` nodes and `m` edges, `O(n+m)`

```
  visited[N] - boolean array with all values set initially to false
  queue      - to store nodes that we need to go back to

bfs ( start )
      visited [ start ] = true
      visit/process the start node 
      queue.push( start )

while queue is not empty
          node = queue.pop()
          for n in adj( node )
              if ! visited[n]
                  visited [n] = true
                  visit/process n 
                  queue.push( n )

```

---

template:section

# Examples and Things to Think About

---

## Graph Implementation

Graphs serve many different purposes and their exact implementation 
needs to suit a specific purpose, so there are generally not among 
data structures implemented in programming language libraries.

Implement a graph. 
Pick any implementation you want (or try it with all of them).

Assume that the graph nodes have integer labels starting with 1 up to N.
The graph is undirected and its description is given as a list of edges: `a b` indicates that 
there is an edge from `a` to `b`.

How would your implementation be different for a directed graph?

Modify your code so that it can handle weighted graphs. In this case
the graph description would be given as a list of edges: 
 `a b w` indicates 
that there is an edge from `a` to `b` with weight `w`.

---

## Challenge

Design algorithms for the following problems.

- Given a pair of nodes, find the length of the shortest path from one to 
the other. Solve this for unweighted and weighted graphs.

- Given a node, find the length of the shortest path from that node to all other nodes in the graph. Solve this for unweighted and weighted graphs.

- Determine if a graph is connected or disconnected (algorithmically, not visually). If it is disconnected, figure out the number of nodes in each
connected component.

- Determine if a graph is a tree (algorithmically, not visually).

---
## Solving Problems

Try to use your graph implementation to solve the following problems:

- [Ab Initio](https://open.kattis.com/problems/abinitio) - the description
is long, but it is a fairly easy problem if you have a graph implementation
already

- [Flying Safely](https://open.kattis.com/problems/flyingsafely)

- [Cantina of Babel](https://open.kattis.com/problems/cantinaofbabel) - this is not a difficult problem, but you need to first figure out 
how to map it to a graph.

---
## Connected Components in a Graph

A __connected component__ (or simply a component) of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph.

- In simpler terms, if you can get from any node in a component to any other node in the same component, and you can't get out of that component to another part of the graph, then it's a connected component.

.left-column2[.center[
<img width="350px" alt="a graph with single connected component" src="../img/15/graph_connected_components_1.png">
]]

.right-column2[.center[
<img width="300px" alt="a graph with three connected component" src="../img/15/graph_connected_components_3.png">

<img width="300px" alt="a graph with three connected component" src="../img/15/graph_connected_components_3_sparse.png">
]]

---

## Why do we care about connected components?

__Social Network Analysis__:

- Community Detection: Identifying groups of tightly knit individuals in a social network. Each connected component could represent a distinct community or isolated group.

- Influence Propagation: If information or a virus spreads through a network, it will only spread within its connected component. This helps in understanding the potential reach of information or disease.

- Identifying Disconnected Users: Finding users who are completely isolated from certain groups or the main network.

__Network Reliability and Redundancy (Computer Networks)__:

- Server Clusters: If a network representing server connections has multiple connected components, it means that some servers cannot communicate with others. This indicates a critical fault or partitioning in the network.

- Resilience Planning: Understanding connected components helps in designing more robust networks by ensuring critical nodes are part of a well-connected component and identifying single points of failure that could split the network into multiple components.

---

## Why do we care about connected components?

__Transportation and Logistics__:

- Road Networks: Identifying if all cities or regions are reachable from each other. If the graph of roads has multiple connected components, it means some areas are completely isolated from others by road.

- Airline Routes: Determining which airports are connected within a single airline's network or across multiple partner airlines.

__Ecosystem Modeling__:

- Habitat Connectivity: In ecological studies, graphs can represent patches of habitat and the corridors connecting them. Connected components identify areas where species can move freely, which is crucial for conservation efforts.

- Disease Spread: Similar to social networks, understanding how connected components in an ecosystem relate to the spread of disease among animal populations.

__and many more ...__

---

## Finding Connected Components

The most common algorithms to find connected components in a graph are Depth-First Search (DFS) and Breadth-First Search (BFS). Both algorithms can be adapted to identify all connected components.

The general approach is as follows:

- Initialize a `visited` array for all vertices to `false`.

- Iterate through each vertex `v` in the graph.

- If `v` has not been visited:

- Start a DFS (or BFS) from `v`. This marks all vertices reachable from `v` as visited.

- All vertices reachable from `v` during this traversal belong to the same connected component.

- Store this set of vertices as a new connected component.

Continue until all vertices have been visited.