Dataflow Analysis

Author

Bor-Yuh Evan Chang

Published

Thursday, November 13, 2025

\(\newcommand{\TirName}[1]{\text{#1}} \newcommand{\inferlab}[3]{ \let\and\qquad \begin{array}{l} \TirName{#1} \\ \displaystyle \frac{#2}{#3} \end{array} } \newcommand{\inferright}[3]{ \let\and\qquad \displaystyle \frac{#2}{#3}\TirName{#1} } \newcommand{\inferrule}[3][]{\inferlab{#1}{#2}{#3}} \newcommand{\infer}[3][]{\inferrule[#1]{#2}{#3}} \)

What questions do you have? What questions does your neighbor have?

Abstract Transition Semantics

JavaScripty Commands and Transitions

\[ \begin{array}{rrrl} \text{commands} & c& \mathrel{::=}& \texttt{;} \mid x \mathrel{\texttt{=}}e \mid\mathbf{assume}\;e \mid c_1 \mathbin{\texttt{;}} c_2 \\ \\ \text{transitions (i.e., control-flow edges)} & t& \mathrel{::=}& \ell_1 \mathrel{\mathord{-}\mkern-4mu[c]\mkern-4mu\mathord{\rightarrow}} \ell_2 \\ \text{locations (i.e., control-flow nodes)} & \ell \\ \\ \text{programs or procedures (i.e., control-flow graphs)} & g& \mathrel{::=}& \circ \mid g,t \end{array} \]

A control-flow graph is a set of a control-flow edges labeled with commands between control locations or program points.

A sequence of atomic commands (i.e., no sequencing or other control-flow) is sometimes called a basic block. A control-flow graph is also often drawn as a flowchart with flow edges between basic-block nodes.

Exercise: Compiling to Control-Flow Graphs

Compile the following example program (for computing \(\texttt{x}^\texttt{n}\)) to a control-flow graph using the line numbers as control-flow locations:

r = 1;
while
  (n > 0) {
    r = r * x;
    n = n - 1;
  }
  

Compiling to Control-Flow Graphs

r = 1;
while
  (n > 0) {
    r = r * x;
    n = n - 1;
  }
  
flowchart TD
  L0((0))
  L1((1))
  L2((2))
  L3((3))
  L4((4))
  L5((5))
  L6((6))

  L0 -->|"r = 1"| L1
  L1 --> L2
  L2 -->|"assume !(n > 0)"| L6
  L2 -->|"assume n > 0"| L3
  L3 -->|"r = r * x"| L4
  L4 -->|"n = n - 1"| L5
  L5((5))
  L5 --> L2

Compiling to Control-Flow Graphs

r = 1;
while
  (n > 0) {
    r = r * x;
    n = n - 1;
  }
  
flowchart TD
  L0((0))
  L1((1))
  L2((2))
  Lcond{n > 0?}
  L3((3))
  L5((5))
  L6((6))

  L0 -->|"r = 1"| L1
  L1 --> L2
  L2 --> Lcond
  Lcond -->|T| L3
  Lcond -->|F| L6
  L3 -->|"r = r * x
n = n - 1"| L5
  L5 --> L2

Exercise: Transition Systems

\[ \text{(control-flow) state} \quad \sigma\quad \mathrel{::=}\quad \ell\colon \rho \]

Define the judgment form \(\sigma \rightarrow_{g} \sigma'\) that says, “Control-flow state \(\sigma\) steps to state \(\sigma'\) in program \(g\).”

Transition Systems

\[ \fbox{$\sigma \rightarrow_{g} \sigma'$} \]

\(\inferrule[DoTransition]{ \rho \vdash c \Downarrow\rho' \and \ell \mathrel{\mathord{-}\mkern-4mu[c]\mkern-4mu\mathord{\rightarrow}} \ell' \in g }{ \ell\colon \rho \rightarrow_{g} \ell'\colon \rho' }\)

Exercise: Collecting Semantics: Forward Reachability

\[ \text{state properties} \quad \Sigma\quad \mathrel{::=}\quad \circ\mid\Sigma,\sigma \]

Define a collecting semantics \(\lbrace\!\lbrack\!\lbrack g \rbrack\!\rbrack\!\rbrace = \lambda \Sigma.\Sigma'\) for forward reachability.

Collecting Semantics: Forward Reachability

\[ \fbox{$\lbrace\!\lbrack\!\lbrack g \rbrack\!\rbrack\!\rbrace = \lambda \Sigma.\Sigma'$} \]

\[ \begin{array}{rrl} \lbrace\!\lbrack\!\lbrack g \rbrack\!\rbrack\!\rbrace\,\Sigma_0 & \mathrel{:=}& \operatorname{lfp}\,(\lambda \Sigma. \Sigma_0 \cup \bigcup_{\sigma\in \Sigma} \{ \sigma' \mid \sigma \rightarrow_{g} \sigma' \} ) \end{array} \]

Exercise: Abstract Semantics: Forward Reachability

\[ \begin{array}{rrrl} \text{abstract states} & \hat{\sigma}& \mathrel{::=}& \ell\colon \hat{\rho} \\ \text{abstract state properties (i.e., program invariants)} & \hat{\Sigma}& \mathrel{::=}& \circ\mid\hat{\Sigma},\hat{\sigma} \end{array} \]

Define an abstract semantics \(\lbrack\!\lbrack g \rbrack\!\rbrack^\sharp = \lambda \hat{\Sigma}.\hat{\Sigma}'\) for forward reachability.

Abstract Semantics: Forward Reachability

\[ \fbox{$\lbrack\!\lbrack g \rbrack\!\rbrack^\sharp = \lambda \hat{\Sigma}.\hat{\Sigma}'$} \]

\[ \begin{array}{rrl} \lbrack\!\lbrack g \rbrack\!\rbrack^\sharp\,\hat{\Sigma}_0 & \mathrel{:=}& \operatorname{lfp}\,(\lambda \hat{\Sigma}. \hat{\Sigma}_0 \mathbin{\sqcup} \bigsqcup_{\hat{\sigma}\in \hat{\Sigma}} \{ \hat{\sigma}' \mid \hat{\sigma} \rightarrow_{g} \hat{\sigma}' \} ) \end{array} \]

where \(\mathbin{\sqcup}\) on abstract states \(\ell\colon \hat{\rho}\) is a point-wise lifting of \(\mathbin{\sqcup}\) on abstract stores \(\hat{\rho}\) at each program location \(\ell\).

The above is dataflow analysis — where \(\hat{\rho}\) is restricted to a finite height domain.

Dataflow Analysis

Exercise: Dataflow Equations

For a given program \(g\), the set of locations is finite. Describe a program invariant \(\hat{\Sigma}\) as a finite set of dataflow equations for each program location. Assume an entry location \(\ell_0\) to the procedure \(g\) with an initial abstract store \(\hat{\rho}_0\).

Dataflow Equations

Definition 1 (Dataflow Equations) An inductive program invariant \(\hat{\Sigma}\) is a solution to the set of dataflow equations for program \(g\) with an entry location \(\ell_0\) and initial abstract store \(\hat{\rho}_0\): \[ \begin{array}{rrll} \hat{\Sigma}( \ell_0 ) & = & \hat{\rho}_0 \\ \hat{\Sigma}( \ell_i ) & = & \bigsqcup_{ \ell_j \mathrel{\mathord{-}\mkern-4mu[c]\mkern-4mu\mathord{\rightarrow}} \ell_i \in g} \{ \hat{\rho}' \mid \hat{\Sigma}(\ell_j) \vdash c \Downarrow\hat{\rho}' \} & \text{for $i \neq 0$} \end{array} \]

We solve a dataflow analysis by finding the least fixed point of the set of transfer functions (i.e., one for each program location \(\ell_i\) where \(i \neq 0\)).

Exercise: Chaotic Iteration

\[ \begin{array}{rrll} \hat{\Sigma}( \ell_0 ) & = & \hat{\rho}_0 \\ \hat{\Sigma}( \ell_i ) & = & \bigsqcup_{ \ell_j \mathrel{\mathord{-}\mkern-4mu[c]\mkern-4mu\mathord{\rightarrow}} \ell_i \in g} \{ \hat{\rho}' \mid \hat{\Sigma}(\ell_j) \vdash c \Downarrow\hat{\rho}' \} & \text{for $i \neq 0$} \end{array} \]

The chaotic iteration algorithm computes the least fixed-point invariant \(\hat{\Sigma}_\omega\) by iterating at each location \(\ell_i\) until convergence.

r = 1;
while
  (n > 0) {
    r = r * x;
    n = n - 1;
  }
  

Fill in the following table where each \(\hat{\Sigma}_i\) is the \(i\)th iterate using the sign abstract domain. Cell \(\hat{\Sigma}_0(\ell_0)\) is \(\hat{\rho}_0\), which consider to be \(\top\).

\(\hat{\Sigma}_0\) \(\hat{\Sigma}_1\) \(\hat{\Sigma}_2\) \(\hat{\Sigma}_3\) \(\hat{\Sigma}_4\) \(\hat{\Sigma}_5\) \(\hat{\Sigma}_6\) \(\hat{\Sigma}_7\) \(\hat{\Sigma}_8\) \(\hat{\Sigma}_9\) \(\hat{\Sigma}_{10}\) \(\hat{\Sigma}_\omega\)
\(\ell_0\) \(\top\)
\(\ell_1\) \(\bot\)
\(\ell_2\) \(\bot\)
\(\ell_3\) \(\bot\)
\(\ell_4\) \(\bot\)
\(\ell_5\) \(\bot\)
\(\ell_6\) \(\bot\)