Dataflow Analysis

Author

Bor-Yuh Evan Chang

Published

Thursday, November 13, 2025

$\newcommand{\TirName}[1]{\text{#1}} \newcommand{\inferlab}[3]{ \let\and\qquad \begin{array}{l} \TirName{#1} \\ \displaystyle \frac{#2}{#3} \end{array} } \newcommand{\inferright}[3]{ \let\and\qquad \displaystyle \frac{#2}{#3}\TirName{#1} } \newcommand{\inferrule}[3][]{\inferlab{#1}{#2}{#3}} \newcommand{\infer}[3][]{\inferrule[#1]{#2}{#3}} $

What questions do you have? What questions does your neighbor have?

Abstract Transition Semantics

JavaScripty Commands and Transitions

\[ \begin{array}{rrrl} \text{commands} & c& \mathrel{::=}& \texttt{;} \mid x \mathrel{\texttt{=}}e \mid\mathbf{assume}\;e \mid c_1 \mathbin{\texttt{;}} c_2 \\ \\ \text{transitions (i.e., control-flow edges)} & t& \mathrel{::=}& \ell_1 \mathrel{\mathord{-}\mkern-4mu[c]\mkern-4mu\mathord{\rightarrow}} \ell_2 \\ \text{locations (i.e., control-flow nodes)} & \ell \\ \\ \text{programs or procedures (i.e., control-flow graphs)} & g& \mathrel{::=}& \circ \mid g,t \end{array} \]

A control-flow graph is a set of a control-flow edges labeled with commands between control locations or program points.

A sequence of atomic commands (i.e., no sequencing or other control-flow) is sometimes called a basic block. A control-flow graph is also often drawn as a flowchart with flow edges between basic-block nodes.

Exercise: Compiling to Control-Flow Graphs

Compile the following example program (for computing $\texttt{x}^\texttt{n}$) to a control-flow graph using the line numbers as control-flow locations:

r = 1;
while
  (n > 0) {
    r = r * x;
    n = n - 1;
  }

Compiling to Control-Flow Graphs

r = 1;
while
  (n > 0) {
    r = r * x;
    n = n - 1;
  }

flowchart TD
  L0((0))
  L1((1))
  L2((2))
  L3((3))
  L4((4))
  L5((5))
  L6((6))

  L0 -->|"r = 1"| L1
  L1 --> L2
  L2 -->|"assume !(n > 0)"| L6
  L2 -->|"assume n > 0"| L3
  L3 -->|"r = r * x"| L4
  L4 -->|"n = n - 1"| L5
  L5((5))
  L5 --> L2

Compiling to Control-Flow Graphs

r = 1;
while
  (n > 0) {
    r = r * x;
    n = n - 1;
  }

flowchart TD
  L0((0))
  L1((1))
  L2((2))
  Lcond{n > 0?}
  L3((3))
  L5((5))
  L6((6))

  L0 -->|"r = 1"| L1
  L1 --> L2
  L2 --> Lcond
  Lcond -->|T| L3
  Lcond -->|F| L6
  L3 -->|"r = r * x
n = n - 1"| L5
  L5 --> L2

Exercise: Transition Systems

\[ \text{(control-flow) state} \quad \sigma\quad \mathrel{::=}\quad \ell\colon \rho \]

Define the judgment form $\sigma \rightarrow_{g} \sigma'$ that says, “Control-flow state $\sigma$ steps to state $\sigma'$ in program $g$.”

Transition Systems

\[ \fbox{$\sigma \rightarrow_{g} \sigma'$} \]

$\inferrule[DoTransition]{ \rho \vdash c \Downarrow\rho' \and \ell \mathrel{\mathord{-}\mkern-4mu[c]\mkern-4mu\mathord{\rightarrow}} \ell' \in g }{ \ell\colon \rho \rightarrow_{g} \ell'\colon \rho' }$

Exercise: Collecting Semantics: Forward Reachability

\[ \text{state properties} \quad \Sigma\quad \mathrel{::=}\quad \circ\mid\Sigma,\sigma \]

Define a collecting semantics $\lbrace\!\lbrack\!\lbrack g \rbrack\!\rbrack\!\rbrace = \lambda \Sigma.\Sigma'$ for forward reachability.

Collecting Semantics: Forward Reachability

\[ \fbox{$\lbrace\!\lbrack\!\lbrack g \rbrack\!\rbrack\!\rbrace = \lambda \Sigma.\Sigma'$} \]

\[ \begin{array}{rrl} \lbrace\!\lbrack\!\lbrack g \rbrack\!\rbrack\!\rbrace\,\Sigma_0 & \mathrel{:=}& \operatorname{lfp}\,(\lambda \Sigma. \Sigma_0 \cup \bigcup_{\sigma\in \Sigma} \{ \sigma' \mid \sigma \rightarrow_{g} \sigma' \} ) \end{array} \]

Exercise: Abstract Semantics: Forward Reachability

\[ \begin{array}{rrrl} \text{abstract states} & \hat{\sigma}& \mathrel{::=}& \ell\colon \hat{\rho} \\ \text{abstract state properties (i.e., program invariants)} & \hat{\Sigma}& \mathrel{::=}& \circ\mid\hat{\Sigma},\hat{\sigma} \end{array} \]

Define an abstract semantics $\lbrack\!\lbrack g \rbrack\!\rbrack^\sharp = \lambda \hat{\Sigma}.\hat{\Sigma}'$ for forward reachability.

Abstract Semantics: Forward Reachability

\[ \fbox{$\lbrack\!\lbrack g \rbrack\!\rbrack^\sharp = \lambda \hat{\Sigma}.\hat{\Sigma}'$} \]

\[ \begin{array}{rrl} \lbrack\!\lbrack g \rbrack\!\rbrack^\sharp\,\hat{\Sigma}_0 & \mathrel{:=}& \operatorname{lfp}\,(\lambda \hat{\Sigma}. \hat{\Sigma}_0 \mathbin{\sqcup} \bigsqcup_{\hat{\sigma}\in \hat{\Sigma}} \{ \hat{\sigma}' \mid \hat{\sigma} \rightarrow_{g} \hat{\sigma}' \} ) \end{array} \]

where $\mathbin{\sqcup}$ on abstract states $\ell\colon \hat{\rho}$ is a point-wise lifting of $\mathbin{\sqcup}$ on abstract stores $\hat{\rho}$ at each program location $\ell$.

The above is dataflow analysis — where $\hat{\rho}$ is restricted to a finite height domain.

Dataflow Analysis

Exercise: Dataflow Equations

For a given program $g$, the set of locations is finite. Describe a program invariant $\hat{\Sigma}$ as a finite set of dataflow equations for each program location. Assume an entry location $\ell_0$ to the procedure $g$ with an initial abstract store $\hat{\rho}_0$.

Dataflow Equations

Definition 1 (Dataflow Equations) An inductive program invariant $\hat{\Sigma}$ is a solution to the set of dataflow equations for program $g$ with an entry location $\ell_0$ and initial abstract store $\hat{\rho}_0$: \[ \begin{array}{rrll} \hat{\Sigma}( \ell_0 ) & = & \hat{\rho}_0 \\ \hat{\Sigma}( \ell_i ) & = & \bigsqcup_{ \ell_j \mathrel{\mathord{-}\mkern-4mu[c]\mkern-4mu\mathord{\rightarrow}} \ell_i \in g} \{ \hat{\rho}' \mid \hat{\Sigma}(\ell_j) \vdash c \Downarrow\hat{\rho}' \} & \text{for $i \neq 0$} \end{array} \]

We solve a dataflow analysis by finding the least fixed point of the set of transfer functions (i.e., one for each program location $\ell_i$ where $i \neq 0$).

Exercise: Chaotic Iteration

\[ \begin{array}{rrll} \hat{\Sigma}( \ell_0 ) & = & \hat{\rho}_0 \\ \hat{\Sigma}( \ell_i ) & = & \bigsqcup_{ \ell_j \mathrel{\mathord{-}\mkern-4mu[c]\mkern-4mu\mathord{\rightarrow}} \ell_i \in g} \{ \hat{\rho}' \mid \hat{\Sigma}(\ell_j) \vdash c \Downarrow\hat{\rho}' \} & \text{for $i \neq 0$} \end{array} \]

The chaotic iteration algorithm computes the least fixed-point invariant $\hat{\Sigma}_\omega$ by iterating at each location $\ell_i$ until convergence.

r = 1;
while
  (n > 0) {
    r = r * x;
    n = n - 1;
  }

Fill in the following table where each $\hat{\Sigma}_i$ is the $i$th iterate using the sign abstract domain. Cell $\hat{\Sigma}_0(\ell_0)$ is $\hat{\rho}_0$, which consider to be $\top$.

	$\hat{\Sigma}_0$	$\hat{\Sigma}_1$	$\hat{\Sigma}_2$	$\hat{\Sigma}_3$	$\hat{\Sigma}_4$	$\hat{\Sigma}_5$	$\hat{\Sigma}_6$	$\hat{\Sigma}_7$	$\hat{\Sigma}_8$	$\hat{\Sigma}_9$	$\hat{\Sigma}_{10}$	$\hat{\Sigma}_\omega$
$\ell_0$	$\top$
$\ell_1$	$\bot$
$\ell_2$	$\bot$
$\ell_3$	$\bot$
$\ell_4$	$\bot$
$\ell_5$	$\bot$
$\ell_6$	$\bot$

	\(\hat{\Sigma}_0\)	\(\hat{\Sigma}_1\)	\(\hat{\Sigma}_2\)	\(\hat{\Sigma}_3\)	\(\hat{\Sigma}_4\)	\(\hat{\Sigma}_5\)	\(\hat{\Sigma}_6\)	\(\hat{\Sigma}_7\)	\(\hat{\Sigma}_8\)	\(\hat{\Sigma}_9\)	\(\hat{\Sigma}_{10}\)	\(\hat{\Sigma}_\omega\)
\(\ell_0\)	\(\top\)
\(\ell_1\)	\(\bot\)
\(\ell_2\)	\(\bot\)
\(\ell_3\)	\(\bot\)
\(\ell_4\)	\(\bot\)
\(\ell_5\)	\(\bot\)
\(\ell_6\)	\(\bot\)