FIFO Depth Tool (`fifo-depth`)¶

Overview¶

The ABE Uarch FIFO Depth Tool computes the optimal FIFO depth and flow-control thresholds for ASIC micro-architectures. It uses Constraint Programming - Satisfiability (CP-SAT) optimization to find minimal-area solutions that satisfy latency and congestion requirements.

Note: Throughout this document, "CP-SAT" is used consistently to refer to the constraint programming satisfiability solver.

Audience¶

ASIC designers who need to size FIFOs precisely under complex traffic profiles.
ASIC DV engineers who want to create stress scenarios or validate margin in performance simulations.

Purpose¶

Determine the smallest FIFO depth and thresholds that prevent underflow or overflow.
Generate witness sequences (read/write patterns) that show the limiting case.
For multi-clock FIFOs, find an optimal partition between asynchronous and synchronous storage.

Key Features¶

Unified solver supporting major flow-control protocols:
- Ready / Valid
- XON/XOFF
- Credit-Based Flow Control (CBFC)
Layered traffic profile input (cycle, transaction, burst, or stream).
Uses a deterministic algorithm to construct the worst-case write and read profiles.
Uses a mathematical solver (CP-SAT) to find the worst peak occupancy.
Proposes optimal Clock Domain Crossing (CDC) solution for multi-clock FIFOs.
Replay buffer baseline solver for future expansion.

Flat vs. Layered Specifications¶

Spec Type	Description	Use Case
Flat	Direct FIFO parameters, explicit bounds on total read/write data over a fixed window.	Simple, well-understood traffic or spreadsheet migration
Layered	Hierarchical profiles (cycle, transaction, burst, stream), tool derives worst-case from structure.	Complex, protocol-specific, or bursty traffic

When to Use Flat vs Layered¶

Traffic Type	Recommended Spec Type
Simple burst with known bounds	Flat
Protocol-modeled behavior	Layered
Unknown or exploratory behavior	Layered
Arbitrary bounds without structure	Flat

Known Limitations¶

Cannot yet model interface/storage quantum mismatches
Threshold optimization for XON/XOFF may have long runtimes for large horizons
No automatic data scaling for CP-SAT solver
CDC solver assumes Gray-coded pointers and synchronizers (not handshake-based async FIFOs)

Getting Started¶

Set Up and Install¶

See ABE Python Development for Python environment setup details.

make py-venv-all
source .venv/bin/activate
make py-install-all

Run Examples¶

fifo-depth src/abe/uarch/fifo_depth_examples/rv_layered.yaml
fifo-depth src/abe/uarch/fifo_depth_examples/cbfc_cdc.yaml

Examine Outputs¶

ls out_uarch_fd*
cat out_uarch_fd_rv_layered/results_scalars.json

Explore Relevant Directory Layout¶

.
├── src
│   └── abe
│       ├── uarch
│       │   ├── fifo_depth_examples
│       │   │   ├── cbfc_balanced.yaml
│       │   │   ├── cbfc_cdc.yaml
│       │   │   ├── cbfc_flat.yaml
│       │   │   ├── cbfc_layered.yaml
│       │   │   ├── replay_cdc.yaml
│       │   │   ├── replay.yaml
│       │   │   ├── rv_balanced.yaml
│       │   │   ├── rv_cdc.yaml
│       │   │   ├── rv_flat.yaml
│       │   │   ├── rv_layered.yaml
│       │   │   ├── xon_xoff_balanced.yaml
│       │   │   ├── xon_xoff_cdc.yaml
│       │   │   ├── xon_xoff_flat.yaml
│       │   │   └── xon_xoff_layered.yaml
│       │   ├── __init__.py
│       │   ├── fifo_depth_base.py
│       │   ├── fifo_depth_cbfc.py
│       │   ├── fifo_depth_cdc.py
│       │   ├── fifo_depth_ready_valid.py
│       │   ├── fifo_depth_replay.py
│       │   ├── fifo_depth_utils.py
│       │   ├── fifo_depth_xon_xoff.py
│       │   ├── fifo_depth.py
│       │   └── pkt_quantize.py

Conceptual Background¶

Scope¶

This tool models short-term congestion in data paths. The YAML specifications describe finite-duration traffic bursts where writes exceed reads. Over longer timescales, system-level flow control ensures balance; that equilibrium is beyond this tool’s scope.

A “balanced” spec (equal write/read densities) requires only minimal buffering for arbitration or phase alignment. Such cases may be detected and handled analytically. The main use case is the imbalanced case — temporary write oversubscription requiring real buffering.

Motivation¶

Traditional sizing methods rely on closed-form equations or spreadsheet estimation. Those approaches may be less accurate when traffic is bursty, multi-layered, or latency-constrained. This tool computes FIFO depth using a hybrid deterministic + solver-based approach. The key idea is to combine (a) domain-specific knowledge of worst-case congestion patterns with (b) an exact optimization stage using CP-SAT. This combination provides a practical and mathematically accurate solution.

For details on how layers compose and how worst-case patterns are generated, see How Layers Compose.

Unique Approach¶

Deterministic Construction of Worst-Case Write and Read Profiles¶

The user’s layered traffic specification (transaction, burst, and stream structure) is first compiled into binary valid profiles for the write and read sides. These profiles are not arbitrary: the tool intentionally creates worst-case congestion patterns using a set of deterministic rules.

Write-side Worst Case (maximizing clustering of data)¶

The write profile generator creates patterns that produce the densest possible windows of valid cycles:

Transactions inside a burst alternate between valid-first and gap-first to create long, contiguous data regions.
Bursts with data (D) and idles (I) are arranged using an (I,D) / (D,I) envelope based on the number of bursts:
- s_cnt = 1: (D,I)
- s_cnt = 2: (I,D) (D,I)
- s_cnt = 3: (I,D) (D,I) (D,I)
- s_cnt = 4: (I,D) (D,I) (I,D) (D,I)
This places idles at the start and end of the stream, while concentrating data in the interior.
For multi-burst streams, boundaries are constructed to create ...1 | 1... transitions (“Case-4” behavior), known to maximize FIFO occupancy.

Read-side Worst Case (delaying consumption)¶

The read profile generator clusters idle cycles to delay consumption:

Every burst begins with a gap (I,D) to ensure reads start as late as possible.
Transactions alternate so that gap boundaries align and create long idle runs.
Burst boundaries aggregate idles across the stream.
A causality guard rotates the read mask so that no read occurs before data can physically arrive, incorporating write latency.

The result is a pair of profiles (write_valid[], read_valid[]) that represent the worst traffic environment allowed by the user’s layered specification.

Constraint-Based Optimization Using CP-SAT¶

The tool first generates deterministic worst-case valid masks using a congestion-packing algorithm that alternates burst phases to maximize data clustering. These binary masks define which cycles are eligible for writes and reads.

Then, a cycle-accurate CP-SAT optimization stage determines exactly when writes and reads fire within these valid windows, while enforcing:

FIFO causality occ[t+1] = occ[t] + writes[t] - reads[t]
Non-negativity occ[t] ≥ 0
Throughput constraints
- total writes = sum_w
- total reads = sum_r
Latency constraints
- wr_latency, rd_latency
- synchronizer delays (CDC)

The solver’s objective is to maximize the peak occupancy under these constraints.

Because the deterministic masks are pre-computed input constraints that encode the worst-case timing structure, CP-SAT only needs to explore cycle-level scheduling inside those windows — which is computationally tractable while still exact.

Why Both Stages Are Necessary¶

Using only deterministic mask construction would give plausible worst cases, but could miss problematic write/read alignments that depend on exact cycle timing.

Using only CP-SAT would require the solver to reason about transactions, bursts, and streams directly, which is computationally impractical for realistic horizons.

The hybrid method combines the strengths of both:

Component	Purpose	Strength
Deterministic Worst-Case Mask Builder	Shapes adversarial write/read behavior	Fast, structured, captures known worst-case patterns
CP-SAT Optimization	Finds the mathematically worst alignment and firing schedule	Exact, global maximum of occupancy

Together, they compute FIFO depths that are tight, consistent, and reliable, yet fast enough for interactive use.

The hybrid deterministic + CP-SAT method provides a true worst-case bound for the reported depth.

Architecture¶

YAML spec files (flat / layered)
             |
             v
   Pydantic models + validation
             |
   ----------------------------
   |                          |
   v                          v
Traffic compiler        CDC solver
(layered → worst-case   (closed-form
 write/read masks)       CDC depth)
   |                          |
   v                          |
CP-SAT solvers                |
(Ready/Valid, XON/XOFF,       |
 CBFC, Replay)                |
   \                         /
    \                       /
     v                     v
        Core results (occ_peak,
        thresholds, credits…)
                     |
                     v
     Post-processing (atomic_tail +
          CDC + margin + rounding)
                     |
                     v
      JSON / CSV / PNG / log outputs

Each solver (Ready/Valid, XON/XOFF, CBFC, Replay, CDC) extends FifoSolver. Common data structures are defined in fifo_depth_base.py.

Typical Workflow¶

Start with a layered write/read spec
Run fifo-depth → observe occ_peak and witness
Apply margin/rounding
For CDC: use CDC solver first, then feed into sync FIFO solver
Integrate results in RTL

File Overview¶

File	Description
`fifo_depth.py`	Top-level orchestrator, CLI entrypoint
`fifo_depth_base.py`	Common models and solver framework
`fifo_depth_utils.py`	Shared helper functions
`fifo_depth_ready_valid.py`	Ready / Valid flow control solver
`fifo_depth_xon_xoff.py`	XON/XOFF flow control solver
`fifo_depth_cbfc.py`	Credit-based flow control solver
`fifo_depth_replay.py`	Replay buffer solver
`fifo_depth_cdc.py`	CDC composition and partitioning
`fifo_depth_examples/`	Example YAML configurations

Command Line Interface¶

Arguments¶

The fifo-depth tool accepts the following command-line arguments:

Argument	Type	Required	Default	Description
`spec`	positional	Yes	—	One or more YAML/JSON spec file path(s). You can specify multiple files.
`--outdir`	optional	No	See description	Output directory for results files (JSON, CSV, PNG) and log file. If not specified, outputs to "out_uarch_fd_\<spec-stem>, where \<spec-stem> is the stem of the spec file.
`--results-name`	optional	No	See description	Prefix for results filenames. The default prefix is the name of the solver results class.
`--verbosity`	optional	No	`info`	Logging level. Choices: `critical`, `error`, `warning`, `info`, `debug`.

Examples¶

# Basic usage with single spec file
fifo-depth src/abe/uarch/fifo_depth_examples/ready_valid.yaml

# Multiple spec files
fifo-depth spec1.yaml spec2.yaml

# With custom output directory and results prefix
fifo-depth my_spec.yaml --outdir ./results --results-name run1

# With debug logging
fifo-depth my_spec.yaml --verbosity debug

Input Specifications¶

Types of Specs¶

Two YAML specification forms are supported:

Type	Description
Flat	Direct FIFO parameters (simple cases).
Layered	Hierarchical read/write profiles (cycle, transaction, burst, stream).

Flat specifications provide explicit bounds on total read and write data over a fixed horizon. Users directly specify sum_w_min, sum_w_max, sum_r_min, and sum_r_max to limit the worst-case traffic pattern. This approach works well when traffic characteristics are well-understood or when migrating from spreadsheet-based analysis. Flat specs require the user to manually calculate aggregate traffic bounds.

Layered specifications define traffic patterns hierarchically through independent read and write profiles. Each profile can include multiple layers (e.g., cycle-level behavior, transaction grouping, burst patterns, stream periodicity) that combine to describe complex, realistic traffic. The tool automatically calculates the worst-case aggregate constraints from these layered descriptions. This approach is more intuitive for modeling protocol-specific behavior and allows the solver to explore combinations of layer parameters to find the most stressful scenario.

Data Quantities¶

The fifo-depth tool:

Assumes that the interface data quantum matches the FIFO storage data quantum.
Treats all data quantities as unitless integers.
Does not automatically scale data values for the CP-SAT solver.

Users must ensure that all write and read data parameters are specified in consistent units (e.g., bytes, words, flits) and at appropriate magnitudes for the solver.

Horizon Concept¶

The horizon defines the time window (in cycles) over which the solver analyzes traffic patterns and computes FIFO depth requirements. It represents the finite duration of the scenario being modeled — typically a burst or congestion event where write traffic temporarily exceeds read traffic.

Key characteristics:

Finite-duration analysis: The horizon captures a specific traffic scenario, not steady-state equilibrium. The solver finds the worst-case FIFO occupancy within this time window.
Flat specs: Users specify horizon directly as a positive integer. All traffic constraints (sum_w_min, sum_w_max, sum_r_min, sum_r_max) apply over this fixed window.
Layered specs: Horizon can be "auto" (recommended) or user-specified. Auto mode computes horizon based on the traffic pattern's natural periodicity (overall_period), the blind window (blind_window_cycles), and minimum repetition count (kmin_blocks). This ensures the solver examines sufficient pattern cycles to find worst-case alignment.
Longer horizons: Generally capture more realistic worst-case scenarios but increase solver runtime. For layered specs, the auto-computed horizon balances coverage and efficiency.

The horizon does not model long-term system-level flow control or equilibrium — it focuses on short-term congestion that requires buffering.

Horizon Sufficiency Check: The solver checks if the specified horizon may be too short to observe the maximum FIFO occupancy. The check uses this formula:

horizon >= (sum_w_max / w_max) + (sum_r_max / r_max)

This check ensures there are enough cycles to write the maximum write traffic (sum_w_max at rate w_max) and read the maximum read traffic (sum_r_max at rate r_max). If the horizon is shorter than this threshold, a warning is issued showing that the computed FIFO depth may be underestimated. This check is most important for protocols without flow control (e.g., ready_valid) where occupancy can reach sum_w_max. Protocols with flow control (e.g., xon_xoff, cbfc) typically have lower occupancy limits because of their flow control mechanisms.

Common Parameters¶

All solvers use these YAML parameters:

Parameter	Type	Required	Default	Description
`fifo_type`	string	Yes	—	FIFO protocol type. Choices: `ready_valid`, `xon_xoff`, `cbfc`, `replay`, `cdc`.
`margin_type`	string	No	`"absolute"`	Type of margin to apply. Choices: `percentage`, `absolute`.
`margin_val`	int	No	`0`	Margin value (non-negative). Interpreted based on `margin_type`.
`rounding`	string	No	`"none"`	Rounding strategy for final depth. Choices: `power2`, `none`.

Flat Spec Parameters¶

The Ready / Valid, XON / XOFF, and CBFC solvers use these YAML parameters for flat specs:

Parameter	Type	Required	Default	Description
`horizon`	int	Yes	—	Number of cycles to model (positive integer).
`wr_latency`	int	No	`0`	Write latency in cycles (non-negative).
`rd_latency`	int	No	`0`	Read latency in cycles (non-negative).
`w_max`	int	No	`1`	Maximum write data per cycle (non-negative).
`r_max`	int	No	`1`	Maximum read data per cycle (non-negative).
`sum_w_min`	int	Yes	—	Minimum total write data over horizon (non-negative).
`sum_w_max`	int	Yes	—	Maximum total write data over horizon (non-negative, ≥ `sum_w_min`).
`sum_r_min`	int	Yes	—	Minimum total read data over horizon (non-negative).
`sum_r_max`	int	Yes	—	Maximum total read data over horizon (non-negative, ≥ `sum_r_min`).

Layered Spec Parameters¶

The Ready / Valid, XON / XOFF, and CBFC solvers use these YAML parameters for layered specs:

Parameter	Type	Required	Default	Description
`horizon`	int or string	No	`"auto"`	Number of cycles to model. Can be `"auto"` (recommended) to automatically compute based on `overall_period`, `blind_window_cycles`, and `kmin_blocks`, or a user-specified positive integer (will be rounded up to multiple of `overall_period`).
`wr_latency`	int	No	`0`	Write latency in cycles (non-negative).
`rd_latency`	int	No	`0`	Read latency in cycles (non-negative).
`kmin_blocks`	int	No	`4`	Minimum number of complete periods to include in auto-computed horizon. Ensures sufficient repetition of layered traffic pattern. Must be ≥ 1.
`blind_window_cycles`	int	No	`0`	Number of cycles during which reads cannot observe writes (e.g., round-trip latency). Auto-computed horizon ensures at least `4 × blind_window_cycles` to capture worst-case behavior. Must be ≥ 0.

Layered Profiles¶

Layered specs specify independent write and read traffic profiles using a hierarchical structure. Both write_profile and read_profile use identical YAML parameters organized into four layers that build upon each other to create complex, realistic traffic patterns.

Layer Structure:

Each profile is composed of up to 4 layers that define traffic at increasing levels of granularity:

Cycle Layer (optional): Defines the maximum data items per cycle
Transaction Layer (required): Defines active and idle periods within a transaction
Burst Layer (required): Defines how transactions are grouped with inter-burst gaps
Stream Layer (optional): Defines how bursts are grouped with inter-stream gaps

Layer Parameters:

Layer	Parameter	Type	Required	Default	Description
Cycle	`cycle.max_items_per_cycle`	int	No	`1`	Maximum data items that can be transferred in a single cycle (≥ 1).
Transaction	`transaction.valid_cycles`	int	Yes	—	Number of active cycles in a transaction (≥ 0).
Transaction	`transaction.gap_cycles`	int	Yes	—	Number of idle cycles in a transaction (≥ 0).
Burst	`burst.transactions_per_burst`	int	Yes	—	Number of transactions in each burst (≥ 1).
Burst	`burst.gap_cycles`	int	Yes	—	Number of idle cycles between bursts (≥ 0).
Stream	`stream.bursts_per_stream`	int	No	`1`	Number of bursts in each stream (≥ 1).
Stream	`stream.gap_cycles`	int	No	`0`	Number of idle cycles between streams (≥ 0).

The fifo-depth tool composes write and read patterns to trigger the worst-case FIFO occupancy. See the How Layers Compose section of the Appendix for details.

Example:

write_profile:
  cycle:
    max_items_per_cycle: 2
  transaction:
    valid_cycles: 4
    gap_cycles: 2
  burst:
    transactions_per_burst: 8
    gap_cycles: 16
  stream:
    bursts_per_stream: 3
    gap_cycles: 32

This creates a write pattern where:

Up to 2 data items can be written per active cycle
Each transaction has 4 active cycles, then 2 idle cycles (period = 6)
Each burst has 8 transactions, then 16 idle cycles (period = 8×6 + 16 = 64)
Each stream has 3 bursts, then 32 idle cycles (period = 3×64 + 32 = 224)

Outputs¶

File	Description
`results_scalars.json`	Key numeric results (depth, thresholds, margins)
`results_witness.csv`	Read/write occupancy sequence producing worst-case depth
`results_plot.png`	Graphical witness visualization
`run.log`	Log of parameters, solver stats, and final results
`cdc_results_scalars.json`	Additional results for CDC partitioning

Example console output:

2025-11-13 07:34:09 - abe.uarch.fifo_depth_base - INFO - results:
{
  "basic_checks_pass": true,
  "msg": "",
  "depth": 36,
  "occ_peak": 32,
  "xon": 15,
  "xoff": 31,
  "t_star": 227
}
2025-11-13 07:34:10 - abe.utils - INFO - XonXoffResults: results.basic_checks_pass=True
2025-11-13 07:34:10 - abe.utils - INFO - Completed src/abe/uarch/fifo_depth_examples/xon_xoff_layered.yaml in 0:02:17

Common Results¶

The Ready / Valid, XON/XOFF, and CBFC solvers produce these common results:

Field	Type	Description
`basic_checks_pass`	bool	Internal validation flag showing whether the solver's self-consistency checks passed. When `true`, the results are valid. When `false`, an error occurred (e.g., peak occupancy mismatch in witness sequences) and the tool will raise an exception.
`msg`	str	Informational message about the solution method. For CP-SAT solvers, typically empty. For CDC solver, contains `"Analytic results."` to indicate closed-form calculation was used instead of constraint programming.
`depth`	int	Recommended minimum FIFO depth. Starts with `occ_peak`, then adds: `atomic_tail` (if applicable), CDC synchronizer depth (if applicable), margin (percentage or absolute), and rounding (power-of-2 if specified). This is the final value for FIFO sizing.
`occ_peak`	int	Peak occupancy computed by the CP-SAT solver. This is the worst-case number of data items simultaneously stored in the FIFO before any margins or adjustments are applied.
`w_seq`	List[int]	Write sequence witness: the number of data items written to the FIFO at each cycle. This time-series demonstrates the worst-case traffic pattern that produces `occ_peak`.
`r_seq`	List[int]	Read sequence witness: the number of data items read from the FIFO at each cycle. Combined with `w_seq`, this shows how the FIFO fills and drains over time.
`occ_seq`	List[int]	Occupancy sequence witness: the FIFO occupancy (number of stored items) at each cycle boundary. The maximum value in this sequence equals `occ_peak`. Useful for visualizing worst-case behavior and validating the solution.

Future Enhancements¶

Replay Solver Extensions¶

As noted in the Replay Solver section, the current implementation serves as a reference baseline for more complex future variants including:

Variable round-trip time (jitter, retry delays)
Credit windows smaller than bandwidth–delay product
Multiple senders sharing a replay buffer
Dynamic flow control and non-uniform transmission rates

Data Quantum and Units Support¶

The tool currently has these data-related limitations:

Assumes interface data quantum matches FIFO storage data quantum
Treats all data quantities as unitless integers
Does not automatically scale data values for CP-SAT solver efficiency

Storage Quantum Mismatch:

The first limitation can underestimate FIFO depth in certain protocols. For example, consider XON/XOFF flow control for Ethernet where FIFO storage is allocated in 64-byte words. Between when the receiver sends XOFF and the sender receives it, the worst case occurs when the sender bursts 65-byte Ethernet packets (64-byte payload + 1-byte overhead). These 65-byte packets use the most FIFO storage relative to bandwidth, but the tool currently cannot model this interface-to-storage quantum mismatch.

Units and Autoscaling:

Supporting units (similar to the pint-based approach used for CDC clock frequencies) would improve specification clarity and reduce user errors from unit inconsistencies.

Automatic scaling would simultaneously simplify YAML specifications and optimize CP-SAT solver performance by normalizing large or small data values to efficient integer ranges.

Implementation Approach:

The recommended enhancement strategy:

Add preprocessing stage to read user YAML, translate units, autoscale values, and generate normalized YAML for existing solvers
Keep current solver implementations unchanged
Add postprocessing stage to reverse scaling and restore units in results

This approach preserves the existing solver architecture while providing a cleaner user interface.

Ready / Valid Solver¶

Ready / Valid Purpose¶

In ABE, “Ready / Valid” is a shorthand name for a generic producer/consumer FIFO with no explicit flow-control protocol modeled inside the solver.

This mode applies to two common architectural situations:

Backpressured systems (e.g., actual Ready/Valid handshake)
- Overflow is prevented by deasserting ready.
- If the traffic pattern would overflow the FIFO, that means the design will be forced to stall the producer and miss performance goals.
Drop-tolerant systems (no backpressure)
- Overflow results in dropped data.
- If the traffic pattern would overflow the FIFO, that means the system will drop data that should not be dropped.

Despite these architectural differences, the FIFO depth requirement is the same:

The FIFO must be sized so that it never overflows under the worst-case write/read schedule within the analysis horizon.

The solver therefore focuses entirely on worst-case occupancy, not on a specific handshake protocol.

Ready / Valid Parameters¶

None beyond common parameters.

This mode assumes:

A known write-valid mask and read-valid mask over the horizon
A known write and read bandwidth
No explicit credit, pause, or replay mechanism

Ready / Valid Results¶

None beyond common results.

The depth returned guarantees:

No overflow over the horizon
Independent of whether the surrounding system uses Ready/Valid backpressure or allows drops

Ready / Valid Recommendations¶

Flat Specifications:

For flat (non-layered) specifications, the FIFO depth will always equal sum_w_max, because:

The write-valid and read-valid masks are constant (all 1s)
The producer can write every cycle
Without temporal structure, the worst case is “write the maximum burst before any reads”
Therefore the FIFO must buffer all writes in the horizon

In this simple case, an analytical calculation is sufficient. The CP-SAT solver yields the same result but provides no optimization benefit.

Balanced Specifications:

When the specification is balanced (minimum read density ≥ maximum write density), CP-SAT cannot produce a meaningful worst-case value:

The solver attempts to delay reads indefinitely to maximize occupancy
But in a balanced spec, reads can always catch up faster than writes
This leads to an unbounded or overly large search space

To avoid this, the tool automatically:

Detects balance using is_balanced()
Switches to a deterministic analytical solver (_get_results_analysis())

This analytical method:

Sweeps all possible write/read phase offsets
Accounts for write and read latency
Computes the true worst-case occupancy over the horizon
Produces a finite, meaningful required depth

Note:

Layered specifications with temporal structure (bursts, gaps, cycles) provide a richer worst-case analysis space where CP-SAT does provide benefit.
Flat and balanced cases are automatically handled using the appropriate analytical paths.

XON / XOFF Solver¶

XON / XOFF Purpose¶

Models XON/XOFF flow control protocols where the receiver uses threshold-based signaling to control the sender's transmission rate. In XON/XOFF:

The receiver monitors FIFO occupancy and asserts XOFF when occupancy reaches the xoff threshold, signaling the sender to pause
The sender reacts to XOFF after a reaction latency (react_latency), then stops or throttles transmission
When occupancy drains below the xon threshold, the receiver de-asserts XOFF (signals XON)
The sender resumes transmission after a resume latency (resume_latency)
Hysteresis between xon and xoff prevents rapid toggling (control chatter) when occupancy stays near a single threshold

XON / XOFF Overview¶

The XON/XOFF solver computes three critical values:

depth - The required FIFO buffer depth
xon - The threshold at which flow control is de-asserted (resume transmission)
xoff - The threshold at which flow control is asserted (pause transmission)

Both xon and xoff are essential hardware parameters. They determine:

Hysteresis behavior: The separation between pause and resume thresholds prevents rapid toggling of flow control
Throughput vs. depth trade-off: Lower thresholds reduce buffering but may cause more frequent pauses, reducing effective bandwidth
Reaction margin: Sufficient headroom above xoff to accommodate data written during the writer's reaction latency

Auto Mode vs. Manual Mode:

The solver supports two modes:

Auto Mode (thresholds="auto", default): The solver automatically computes optimal threshold values using constraint programming. It finds threshold configurations that satisfy throughput targets while minimizing FIFO depth, then explores the hysteresis range to balance depth, throughput, and control stability.
Manual Mode (thresholds="manual"): You specify fixed xon and xoff values, and the solver validates whether they provide sufficient flow control and computes the resulting FIFO depth.

How Auto-Optimization Works:

When thresholds="auto" (default):

The solver analyzes traffic patterns (write/read caps, latencies, atomic writes, throttle rates)
It calculates bounds for thresholds based on reaction/resume latencies and sustained throughput requirements
It explores candidate threshold pairs within the specified hysteresis range
It optimizes based on preferences: minimize depth (default), minimize hysteresis band (prefer_small_band), or minimize xoff (prefer_low_xoff)
It validates each candidate against throughput targets and requirements
It applies any specified margins and rounding to the final depth

Note on adaptive optimization: The solver automatically adjusts threshold search ranges based on traffic burstiness to help ensure adequate margin for reaction latency while avoiding over-provisioning.

XON / XOFF Parameters¶

XON / XOFF parameters include all common parameters plus these XON / XOFF-specific parameters:

Parameter	Type	Required	Default	Description
`atomic_tail`	int	No	`0`	Number of data items in the final atomic write transaction that cannot be interrupted (non-negative). When the writer begins this tail transaction, it must complete even if XOFF is asserted.
`react_latency`	int	No	`0`	Latency in cycles for the writer to react to XOFF assertion and stop writing (non-negative). Data continues to be written during this reaction period.
`resume_latency`	int	No	`0`	Latency in cycles for the writer to resume writing after XON assertion (non-negative). No data is written during this resume period.
`w_throttle_max`	int	No	`0`	Maximum number of data items that can be written per cycle while paused (non-negative). Default `0` means hard stop - no writes occur when paused.
`thresholds`	str	No	`"auto"`	Threshold configuration mode: `"manual"` or `"auto"`. In manual mode, `xon` and `xoff` must be specified. In auto mode, the solver optimizes threshold values.
`xon`	int or None	Conditional	`None`	XON threshold - occupancy level at which flow control is de-asserted and writer resumes (non-negative). Required when `thresholds="manual"`, ignored in auto mode.
`xoff`	int or None	Conditional	`None`	XOFF threshold - occupancy level at which flow control is asserted to pause the writer (non-negative). Required when `thresholds="manual"`, ignored in auto mode. Must be ≥ `xon` to ensure hysteresis.
`throughput_target`	float or "auto"	No	`"auto"`	Target throughput ratio (0.0 to 1.0) for auto mode. When `"auto"`, defaults to maximum achievable throughput. Lower values allow smaller FIFO depths at the cost of reduced performance.
`xon_min`	int or "auto"	No	`"auto"`	Minimum allowed XON threshold for auto mode (non-negative). When `"auto"`, solver computes the minimum based on resume latency and traffic patterns. Constrains the search space for threshold optimization.
`xoff_range`	List[int] or "auto"	No	`"auto"`	Range of XOFF values `[min, max]` to explore in auto mode. When `"auto"`, solver determines the range based on traffic patterns and throughput target. Allows constraining threshold search space.
`hysteresis`	List[float or int]	No	`[1.0, 1.5]`	Hysteresis range `[min, max]` as ratio between XOFF and XON thresholds. Used in auto mode to ensure separation between pause and resume levels. Both values must be ≥ 1.0, and `min ≤ max`. Larger hysteresis reduces control chatter but may increase depth.
`prefer_small_band`	bool	No	`false`	When `true`, auto mode prioritizes minimizing hysteresis band (XOFF - XON) over other optimization goals. Useful when minimizing control signal transitions is more important than depth.
`prefer_low_xoff`	bool	No	`false`	When `true`, auto mode prioritizes lower XOFF values over other optimization goals. Useful when minimizing peak occupancy before flow control engages is important.

XON / XOFF Results¶

XON / XOFF results include all common results plus these XON / XOFF-specific results:

Result	Description
`xon`	Computed or validated XON threshold - the FIFO occupancy level at which flow control is de-asserted to resume writing
`xoff`	Computed or validated XOFF threshold - the FIFO occupancy level at which flow control is asserted to pause writing
`throughput`	Achieved write throughput as a ratio in [0.0, 1.0], normalized to (horizon * w_max). It is populated by the solver after witness extraction.
`t_star`	Earliest cycle at which peak occupancy occurs. Used as a tiebreaker in auto mode optimization when multiple threshold configurations achieve the same depth. Lower values indicate the peak occurs earlier in the horizon.

XON / XOFF Recommendations¶

Auto Mode Benefits:

Auto mode (thresholds="auto") is recommended because:

Optimal Threshold Selection: The solver finds threshold configurations that minimize FIFO depth while meeting your throughput requirements, avoiding over-provisioning of buffer memory.
Correct Reaction Margins: The solver automatically accounts for reaction and resume latencies, ensuring sufficient headroom above xoff to handle data written during the pause reaction period.
Adaptive to Traffic Patterns: The solver analyzes your specific write/read profiles, atomic transactions, and throttle behavior to determine appropriate threshold values, rather than requiring manual calculation.
Hysteresis Optimization: The solver explores the specified hysteresis range to balance control stability (avoiding chatter) against depth minimization.

When to Use Manual Mode:

Manual mode is useful when:

You have pre-existing XON/XOFF protocol specifications that must be validated
Hardware constraints dictate specific threshold values (e.g., register width limitations)
You want to verify that a proposed threshold configuration is sufficient for your traffic

Recommendations:

Consider starting with auto mode to understand the optimal thresholds for your traffic pattern
Use hysteresis (default: [1.0, 1.5]) to control the minimum and maximum separation between xon and xoff
Set throughput_target below 1.0 if you're willing to trade some bandwidth for reduced FIFO depth
Use prefer_small_band=true to minimize hysteresis width if reducing control signal transitions is critical
Use prefer_low_xoff=true to minimize peak occupancy if you want flow control to engage earlier
Apply standard margin and rounding options to add implementation safety margin to the computed depth

CBFC Solver¶

CBFC Purpose¶

Models Credit-Based Flow Control (CBFC) protocols where the sender's transmission rate is regulated by credits returned asynchronously from the receiver. In CBFC:

The sender maintains a credit pool that is decremented with each write and incremented when credits are returned from the receiver
Credits are returned after the receiver consumes data, subject to a return latency (cred_ret_latency)
The sender can only transmit when sufficient credits are available
This provides backpressure without requiring a synchronous ready/valid handshake

CBFC Overview¶

The CBFC solver computes three critical values:

depth - The required FIFO buffer depth
cred_max - The maximum size of the credit pool
cred_init - The initial number of credits at startup

Both cred_max and cred_init are essential hardware implementation parameters. They determine:

Credit pool sizing: How many credits the sender must track (affects counter width or memory requirements)
Startup behavior: How much data can be transmitted before the first credits return from the receiver
Sustained throughput: Whether the credit return rate can support the desired bandwidth

Auto Mode vs. Manual Mode:

The solver supports two modes:

Auto Mode (cred_max="auto" and/or cred_init="auto", default): The solver automatically computes optimal credit values using constraint programming. It finds the minimal feasible credit configuration that satisfies the traffic requirements, then adds adaptive headroom based on read-valid gapiness patterns.
Manual Mode (explicit integer values): You specify fixed credit values, and the solver validates whether they provide sufficient flow control and computes the resulting FIFO depth.

How Auto-Optimization Works:

When cred_auto_optimize=true (default) with auto mode:

The solver analyzes traffic patterns (write/read caps, latencies, credit return latency)
It computes lower bounds for credits based on startup requirements and sustained throughput
It performs lexicographic minimization: first minimizes cred_init, then minimizes cred_max
It adds adaptive headroom derived from the read-valid pattern's maximum gap between valid slots
It applies any specified margins and rounding

Note on adaptive headroom: The solver automatically adjusts headroom based on read-valid gapiness to help ensure credits are available even when reads arrive in bursts rather than uniformly.

CBFC Parameters¶

CBFC parameters include all common parameters plus these CBFC-specific parameters:

Parameter	Type	Required	Default	Description
`cred_max`	int or "auto"	No	`"auto"`	Maximum credit pool size (non-negative). When `"auto"`, solver optimizes this value.
`cred_init`	int or "auto"	No	`"auto"`	Initial credit count at start (non-negative). When `"auto"`, solver optimizes this value. Must be ≤ `cred_max`.
`cred_gran`	int	No	`1`	Credit granularity - number of data units per credit (non-negative).
`cred_ret_latency`	int	No	`0`	Latency in cycles for credit return from receiver to sender (non-negative).
`cred_auto_optimize`	bool	No	`true`	Enable automatic optimization of credit parameters when set to `"auto"`.
`cred_headroom`	int	No	`2`	Additional headroom credits beyond computed minimum (non-negative). Used when auto-optimizing credits.
`cred_margin_type`	str	No	`"absolute"`	Type of margin to apply to credit values: `"percentage"` or `"absolute"`.
`cred_margin_val`	int	No	`0`	Margin value to add to credit calculations.
`cred_rounding`	str	No	`"none"`	Rounding strategy for credit values: `"power2"` or `"none"`.

CBFC Results¶

CBFC results include all common results plus these CBFC-specific results:

Result	Description
`cred_max`	Computed or validated maximum credit pool size
`cred_init`	Computed or validated initial credit count
`throughput`	Achieved write throughput as a ratio in [0.0, 1.0], normalized to (horizon * w_max). It is populated by the solver after witness extraction.

CBFC Recommendations¶

Auto Mode Benefits:

Auto mode (cred_max="auto" and cred_init="auto") is recommended because:

Optimal Credit Sizing: The solver finds the minimal credit configuration that satisfies your traffic requirements, avoiding over-provisioning of credit counters or pools.
Correct Startup Credits: The solver automatically accounts for credit return latency and startup requirements, ensuring sufficient initial credits to achieve desired throughput before the first credits return.
Adaptive to Traffic Patterns: The solver analyzes your specific write/read profiles and latencies to determine appropriate credit values, rather than requiring manual calculation.
Built-in Headroom: Adaptive headroom is automatically added based on read-valid gapiness, providing protection against traffic burstiness.

When to Use Manual Mode:

Manual mode is useful when:

You have pre-existing credit protocol specifications that must be validated
Hardware constraints dictate specific credit pool sizes (e.g., power-of-2 counter widths)
You want to verify that a proposed credit configuration is sufficient

Recommendations:

Consider starting with auto mode to understand the minimal credit requirements for your traffic pattern
Use cred_headroom (default: 2) to add safety margin beyond the computed minimum
Apply cred_rounding="power2" if your hardware prefers power-of-2 credit pool sizes
Use cred_margin_type and cred_margin_val to add implementation margin to credit values

CDC Solver¶

CDC Purpose¶

The CDC solver handles FIFO depth sizing when the write and read interfaces operate in different clock domains. It implements a two-stage approach:

Small CDC FIFO: Handles clock domain crossing with Gray-coded pointers and synchronizer latency
Large synchronous FIFO: Buffers traffic patterns in a single clock domain

This separation is preferred over a single asynchronous FIFO because:

Gray counter logic is simpler for small, power-of-2 depths
Most buffering depth can use simpler synchronous FIFO logic
The CDC solver calculates both the small CDC depth and initial conditions for the downstream synchronous solver

CDC Overview¶

The CDC solver computes three critical values:

depth - The required small CDC FIFO buffer depth
base_sync_fifo_depth - The minimum depth for the large synchronous FIFO based on long-term rate mismatch
wptr_cdc_cycles_in_wr - Write pointer CDC latency converted to write-domain cycles

These values enable a two-stage solution: the small CDC FIFO handles clock domain crossing, while the computed parameters inform stage 2's solution of the large synchronous FIFO.

Analytic Solution vs. CP-SAT:

Unlike other solvers (Ready/Valid, XON/XOFF, CBFC, Replay) which use CP-SAT constraint programming, the CDC solver uses closed-form analytical formulas. This works because:

Well-defined latencies: Synchronization stages, Gray code delays, and phase relationships have deterministic bounds
Independent components: Synchronizer depth, phase margin, and PPM drift can be calculated separately and summed
No optimization needed: There are no tunable parameters to optimize (like thresholds or credits) — the physics of clock domain crossing determines the requirements
Computational efficiency: Analytic formulas execute instantly, while CP-SAT would add unnecessary overhead for a problem with a unique deterministic solution

The CDC depth calculation sums three components:

Credit loop depth: Steady-state round-trip latency buffering required to sustain maximum write throughput. Accounts for write pointer synchronization into the read domain, read-side reaction time, read pointer synchronization back to the write domain, and write-side full-flag update.
Phase margin depth: One read-cycle worth of items to account for unknown initial phase relationship between clocks
PPM drift depth: Accumulated frequency drift over the horizon in both write and read domains

CDC Parameters¶

CDC parameters include all common parameters plus these CDC-specific parameters:

Parameter	Type	Required	Default	Description
`wr_clk_freq`	int, float, or str	Yes	-	Write clock frequency. Can be specified as integer Hz (e.g., `1000000000` for 1 GHz), float Hz, or string with units (e.g., `"1.1 GHz"`, `"100 MHz"`). Must be positive.
`rd_clk_freq`	int, float, or str	Yes	-	Read clock frequency. Can be specified as integer Hz, float Hz, or string with units. Must be positive.
`big_fifo_domain`	str	No	`"write"`	Clock domain for the large synchronous FIFO in stage 2: `"write"` or `"read"`. Determines which domain's cycles are used for the `horizon` and traffic pattern analysis.
`wr_clk_ppm`	int	No	`0`	Write clock frequency tolerance in parts-per-million (non-negative). Used to calculate worst-case drift over the horizon.
`rd_clk_ppm`	int	No	`0`	Read clock frequency tolerance in parts-per-million (non-negative). Used to calculate worst-case drift over the horizon.
`wptr_inc_cycles`	int	No	`1`	Write-domain cycles to increment write pointer after a write (non-negative).
`wptr_sync_latency_uncertainty`	int	No	`1`	Additional read-domain cycles for write pointer synchronizer latency uncertainty (non-negative). Accounts for ±1 cycle variation depending on when the pointer updates relative to the sampling clock edge. Default 1 covers worst-case; set to 0 for best-case analysis.
`wptr_sync_stages`	int	No	`2`	Number of synchronizer flip-flop stages for write pointer CDC crossing (non-negative). Typical values are 2-3 stages for MTBF requirements.
`rd_react_cycles`	int	No	`1`	Read-domain cycles for read logic to react after seeing new data (non-negative).
`rptr_inc_cycles`	int	No	`1`	Read-domain cycles to increment read pointer after a read (non-negative).
`rptr_sync_latency_uncertainty`	int	No	`1`	Additional write-domain cycles for read pointer synchronizer latency uncertainty (non-negative). Accounts for ±1 cycle variation depending on when the pointer updates relative to the sampling clock edge. Default 1 covers worst-case; set to 0 for best-case analysis.
`rptr_sync_stages`	int	No	`2`	Number of synchronizer flip-flop stages for read pointer CDC crossing (non-negative). Typical values are 2-3 stages for MTBF requirements.
`wr_full_update_cycles`	int	No	`1`	Write-domain cycles to update the full flag after read pointer synchronization (non-negative).
`window_cycles`	int or "auto"	No	`"auto"`	Horizon size in cycles for the `big_fifo_domain`. When `"auto"`, extracted from `horizon` field or compiled from layered profiles. Must be positive when specified as integer.

CDC Results¶

The CDC-specific results are:

Result	Description
`depth`	Required small CDC FIFO buffer depth (after margin and rounding). Sum of `credit_loop_depth` and `ppm_drift_depth`.
`credit_loop_depth`	Depth component for steady-state round-trip latency. Accounts for write pointer increment, write pointer CDC crossing (including latency uncertainty), read reaction, read pointer increment, read pointer CDC crossing (including latency uncertainty), and full flag update—all converted to items at the write rate.
`ppm_drift_depth`	Depth component for PPM frequency drift. Accumulated worst-case drift over the horizon in both write and read domains (combined).
`base_sync_fifo_depth`	Minimum depth for the large synchronous FIFO in stage 2. Based on long-term rate mismatch over the window: `window_cycles × items_per_cycle × max(0, 1 - f_rd/f_wr)`. Only positive when write clock is faster than read clock.
`wptr_cdc_cycles_in_wr`	Write pointer CDC latency (time for write pointer to become visible in read domain) converted to write-domain cycles. Used as initial condition parameter for stage 2 synchronous FIFO solver's `rd_latency` adjustment.

Results are saved in file cdc_results_scalars.json in the output directory.

CDC Recommendations¶

Margin and Rounding Strategy:

CDC FIFOs should typically be small and power-of-2 sized for efficient Gray counter implementation. However, automatic margin and rounding can lead to excessive sizing:

If the solver computes a depth of 13 and you specify 25% margin with power-of-2 rounding, the result will be 32 (potentially excessive)
Recommended approach: Set margin_val=0 and rounding="none" initially, review the computed depth, then manually select an appropriate power-of-2 size
For critical applications, consider adding 1-2 entries of margin, then rounding up to the next power-of-2

Clock Domain Selection:

The big_fifo_domain parameter determines which clock domain is used for the large synchronous FIFO in stage 2:

Set big_fifo_domain="write" (default) when the write clock is faster or when traffic patterns are naturally specified in write-domain cycles
Set big_fifo_domain="read" when the read clock is faster, as this may reduce the total depth of the large synchronous FIFO
The CDC FIFO depth is independent of this choice; only the stage 2 synchronous FIFO is affected

PPM Tolerance Specification:

Set wr_clk_ppm and rd_clk_ppm to match your oscillator specifications (e.g., 100 ppm for typical crystals)
Omit or set to 0 if clocks are derived from the same source (frequency-locked)
PPM drift accumulates over the horizon, so larger windows require proportionally more depth
Higher PPM values add safety margin but increase CDC FIFO size

Synchronizer Stage Configuration:

Default sync_stages=2 is standard for most applications and provides sufficient MTBF (Mean Time Between Failures)
Use sync_stages=3 for very high-speed designs or stringent reliability requirements
ptr_gray_extra=1 (default) accounts for Gray code sampling uncertainty; typically no need to change
Increasing these values improves reliability but increases synchronization latency and CDC depth

Analysis Window:

Use window_cycles="auto" (default) to inherit from the main horizon or layered profile specification
Manually specify window_cycles only if you need a different horizon than the synchronous FIFO
The window should cover the longest traffic burst or congestion period you want to handle

Replay Solver¶

Replay Purpose¶

Models replay buffers that hold unacknowledged (in-flight) data until acknowledgements (ACKs) arrive after a round-trip time (RTT). This is needed for protocols that require retransmission capability, where data must be kept until confirmed receipt.

Replay Overview¶

The replay FIFO model uses a CP-SAT formulation for consistency with other fifo_depth_* solvers. For the standard deterministic case (fixed RTT, single sender, continuous transmission at w_max), the solution is equivalent to the classic bandwidth–delay product (BDP):

peak_inflight = min(rtt, horizon - rtt) × w_max

The solver enforces that:

Writes transmit data into the replay buffer
Acknowledgements arrive exactly rtt cycles after the corresponding write
No new writes occur in the final rtt cycles (ensuring the buffer drains to zero by the horizon)
The buffer must accommodate all in-flight (unacknowledged) data

While the current implementation produces the same result as the analytical BDP formula, the CP-SAT solver serves as a reference implementation and extensibility placeholder for more complex future variants such as:

Variable rtt[t] (jitter, retry delays)
Credit windows smaller than BDP
Multiple senders sharing a replay buffer
Discrete protocol phases (pause, resume, retry)
Dynamic flow control rules (w[t] ≤ function of inflight)
Non-uniform w_max (burst shaping)
Overlapping atomic tail semantics

Future extensions will leverage this solver framework to capture those non-trivial dynamics without requiring new analytical derivations.

Replay Parameters¶

The Replay FIFO is fundamentally different from the previously described Ready / Valid, XON/XOFF, and CBFC FIFOs. It only supports flat specs (not layered profiles) and models the transmission/acknowledgement pattern directly. Layered replay is not currently supported because replay buffers inherently depend on temporal ACK structure rather than transaction/burst layering.

Replay supports the common parameters fifo_type, margin_type, margin_val, and rounding and these replay-specific parameters:

Parameter	Type	Required	Default	Description
`horizon`	int	Yes	—	Number of cycles to model (positive integer). Must be ≥ `rtt`.
`w_max`	int	No	`1`	Maximum write (transmit) data per cycle (non-negative). Represents the maximum bandwidth of the sender.
`atomic_tail`	int	No	`0`	Additional buffer space to reserve beyond the computed in-flight peak (non-negative). Useful for protocol-specific padding or alignment requirements.
`rtt`	int	Yes	—	Round-trip time in cycles (positive integer). The delay between a data transmission and its corresponding acknowledgement. Must be ≤ `horizon`.

Replay Results¶

The replay solver results are fundamentally the same as the common FIFO results in principle, but use more precise terminology specific to replay buffers. In a replay buffer context:

Occupancy (occ) → Inflight (infl): The amount of data that has been transmitted but not yet acknowledged
Read (r) → Acknowledgement (ack/a): The signal indicating data has been received and can be removed from the replay buffer
Write (w) remains Write (w): The transmission of new data

The mapping between common FIFO results and replay-specific results is:

Common Result	Replay Result	Description
`depth`	`depth`	Required buffer depth
`occ_peak`	`infl_peak`	Peak inflight data (maximum unacknowledged transmissions)
`w_seq`	`w_seq`	Write (transmit) sequence over time
`r_seq`	`a_seq`	Read (acknowledgement) sequence over time
`occ_seq`	`infl_seq`	Occupancy (inflight) sequence over time

Why CP-SAT Yields the Same Result as the BDP Equation:

The CP-SAT solver produces the same result as the analytical bandwidth–delay product (BDP) calculation because of the specific constraints in the current Replay model:

Deterministic Acknowledgements: Every write at cycle t generates an acknowledgement at cycle t + rtt. This creates a perfectly predictable in-flight pattern.
Maximum Transmission Rate: The solver maximizes the objective (peak in-flight data) by transmitting at w_max continuously during the valid transmission window.
Drain Constraint: The requirement that no writes occur in the final rtt cycles (ensuring inflight[horizon] == 0) creates a symmetric ramp-up and ramp-down pattern.
Peak Occurs at Predictable Time: Under continuous transmission at w_max, in-flight data accumulates linearly until either:
Time rtt (first acknowledgements arrive), or
Time horizon - rtt (last writes that will be acknowledged before horizon)

The peak is therefore min(rtt, horizon - rtt) × w_max.

No Alternative Strategies: The CP-SAT solver explores the solution space but finds no transmission pattern that exceeds this peak. Any change from continuous transmission at w_max (e.g., pausing or reducing transmission rate) would only reduce the in-flight accumulation.

The CP-SAT formulation serves as a reference implementation that validates the analytical formula and provides a foundation for future extensions where analytical solutions may not exist (variable RTT, multiple senders, flow control, etc.).

Advanced Topics¶

Performance Considerations¶

Solver runtime depends on several factors:

Horizon Length: Longer horizons increase the number of variables and constraints, leading to longer solve times. For layered specs, auto-computed horizons that cover many periods can significantly increase runtime. Layered specs generate multiple periodicities whose least common multiple (LCM) determines the horizon, which directly increases solver variables.
Layer Complexity: Each additional layer (cycle, transaction, burst, stream) increases the combinatorial space the solver must explore. Deeply layered profiles with many bursts or streams are more computationally intensive.
Spec Type: Flat specs are typically solved in seconds, as they have fewer variables and a simpler structure. Layered specs, especially with large horizons or many periods, may take from seconds to several minutes.
Traffic Burstiness: Highly bursty or imbalanced profiles can create more challenging optimization problems, increasing runtime.
Parameter Choices: Using "auto" for horizon or specifying a large number of periods can unintentionally create very large problem sizes. Manually limiting the horizon or simplifying profiles can improve performance.

Recommendations for Efficient Solving:

Consider starting with flat specs or small horizons to validate basic behavior.
Use layered specs for detailed analysis, but minimize the number of periods and layers where possible.
If solve time is excessive, consider reducing the horizon, simplifying the traffic profile, or using fewer bursts/streams.
Monitor solver logs for warnings about horizon sufficiency or problem size.

In most practical cases, flat and moderately layered specs solve in under a minute. Only highly complex, deeply layered, or very long-horizon cases should require longer runtimes.

How Layers Compose¶

The tool builds a binary valid pattern (0/1 sequence) by composing layers from innermost (transaction) to outermost (stream):

Transaction period = valid_cycles + gap_cycles
Burst = transactions_per_burst transactions + gap_cycles
Stream = bursts_per_stream bursts + gap_cycles
Overall period = LCM of write and read stream periods

Worst-Case Pattern Generation:

The tool uses different strategies based on whether the profile has a single burst or multiple bursts, implementing patterns from FIFO depth calculation literature to create maximum congestion:

For Write Profiles (maximizing FIFO fill):

Single Burst (bursts_per_stream == 1):

Uniform valid-first pattern: All transactions use valid cycles followed by gap cycles
Maximizes clustering of valid cycles within the burst for safe depth estimation
No burst boundaries exist to optimize

Multiple Bursts (bursts_per_stream > 1):

Transaction-level alternation: Transactions within each burst alternate between gap-first and valid-first patterns to achieve valid clustering at burst boundaries
- Pattern choice depends on burst index parity and transactions_per_burst parity
- Odd-indexed bursts are configured to end with valid cycles
- Even-indexed bursts ≥ 2 are configured to start with valid cycles
Burst-level alternation: Bursts place gaps strategically to concentrate data in the middle of the observation window
- First burst (index 0): gap followed by transactions (I,D) - places idle at start
- Last burst: transactions followed by gap (D,I) - places idle at end
- Middle bursts alternate:
  - Odd-indexed middle bursts: transactions followed by gap (D,I)
  - Even-indexed middle bursts: gap followed by transactions (I,D)
- Creates worst-case patterns by concentrating data centrally:
  - 2 bursts: (I,D)(D,I)
  - 3 bursts: (I,D)(D,I)(D,I)
  - 4 bursts: (I,D)(D,I)(I,D)(D,I)

For Read Profiles (maximizing FIFO drain delay):

Single Burst (bursts_per_stream == 1):

Uniform gap-first pattern: All transactions use gap cycles followed by valid cycles
Maximizes delay by front-loading idle cycles within the burst

Multiple Bursts (bursts_per_stream > 1):

Transaction-level alternation: Transactions alternate to cluster idle cycles at boundaries
- Alternation pattern based on combined (burst_index + transaction_index) parity
- Even parity: valid-first then gap (boundary with next gap-first forms larger idle)
- Odd parity: gap-first then valid
Burst-level alternation: All burst gaps placed at start to maximize read delay

Causality Preservation:

Write patterns never start with gaps (first burst begins with transactions)
Read patterns include warmup rotation based on write latency to ensure reads occur after writes
Guarantees cumulative writes ≥ cumulative reads at all times

This hybrid approach (uniform for single burst, alternating for multi-burst) ensures conservative FIFO depth estimates across all configurations.

FAQ¶

If you encounter performance issues, the first four questions may be helpful to review first.

Why does the solver sometimes take several minutes to run?¶

Solver runtime is dominated by:

Horizon length — longer windows → more variables → longer solve time
Layer complexity — cycle + transaction + burst + stream generates deeply nested patterns
Burstiness / imbalance — highly adversarial patterns increase the CP-SAT search space
XON/XOFF auto-threshold search — exploring multiple (xon, xoff) candidates can multiply solve time

Flat specifications typically solve in seconds. Layered profiles with multiple streams/periods can take minutes. This is normal and expected based on CP-SAT complexity.

Why does `horizon: auto` sometimes produce very large horizons?¶

Because the tool must ensure it covers all worst-case alignments between the write and read patterns. The auto logic includes:

multiple full pattern periods (kmin_blocks)
blind window coverage (blind_window_cycles)
LCM of read/write stream periods
safety padding

This produces a horizon long enough to ensure correctness. If runtime is too long, you may manually set a smaller horizon—but note that making it too small may underestimate FIFO depth.

Why does the depth equal `sum_w_max` for flat ready/valid?¶

Flat ready/valid has:

constant write-valid = 1
constant read-valid = 1
no temporal structure to exploit

The worst case is:

depth = sum_w_max

Because all writes can occur before any reads.

Why does the solver switch from CP-SAT to analytic mode for balanced specs?¶

A spec is balanced when:

(min read density) ≥ (max write density)

In this case:

reads can always drain faster than writes can fill
CP-SAT can delay reads arbitrarily → problem becomes poorly defined

So the solver automatically switches to a deterministic analytic solver that:

sweeps phase alignment
includes latencies
calculates worst-case occupancy deterministically

This produces a stable, meaningful result.

Why does the XON/XOFF solver choose a higher `xoff` than expected?¶

Usually because:

react_latency > 0
atomic_tail > 0
the write pattern is very bursty
throughput target must be met
hysteresis constraints limit low values
low xoff would not be feasible

The solver must leave headroom above xoff to absorb data written during the reaction latency.

So xoff increases when mathematically required for correctness.

Why are thresholds or credits ignored when using auto mode?¶

In auto mode:

thresholds: auto
cred_max: auto
cred_init: auto

the solver intentionally ignores user-specified values and computes optimal ones.

To force manual values:

thresholds: manual
xon: <value>
xoff: <value>

or for CBFC:

cred_auto_optimize: false
cred_max: <value>
cred_init: <value>

Why is peak occupancy lower than expected?¶

Common reasons:

burst/gap structure limits congestion
XON/XOFF throttling reduces peaks
CBFC credit return naturally controls writes
CDC FIFO is limited by synchronizer depth

Many spreadsheet estimates are pessimistic; CP-SAT often finds a tighter true bound.

Why is peak occupancy higher than expected?¶

Often due to:

adversarial phase alignment
large blind window
write/read latency asymmetry
clustered burst boundaries
XON/XOFF reaction overshoot
CBFC credit-return latency

The solver explores worst-case timing that may not be immediately intuitive.

Why does XON/XOFF auto mode take longer than other solvers?¶

Because auto mode must:

Compute feasible xon_min
Compute feasible xoff range
Sweep hysteresis space
Solve CP-SAT per candidate
Choose the lexicographically optimal configuration

Manual threshold mode is much faster.

Why is replay modeled with CP-SAT if the result is just BDP?¶

Replay currently follows deterministically from:

peak_inflight = min(rtt, horizon - rtt) × w_max

But CP-SAT is used for:

consistency with other solvers
reuse of infrastructure
future support of jitter, retries, multi-sender cases

BDP is just the simplest member of a larger family of replay behaviors.

Why does CDC always use closed-form math instead of CP-SAT?¶

CDC behavior is governed by:

synchronizer latency
Gray code pointer stability
phase uncertainty
PPM drift

These have strict analytic bounds and no adversarial scheduling. Closed-form computation is exact and instantaneous.

Why does rounding sometimes jump depth by a large factor?¶

Two features cause this:

Margin (percentage or absolute)
Rounding (power2)

Example:

depth_raw = 13
25% margin → 16.25 → 17
round to power of 2 → 32

The tool prioritizes safety and hardware alignment.

To avoid over-sizing:

disable rounding
or apply zero margin
inspect raw depth first
then manually choose rounding

Why does the tool warn that the horizon may be too small?¶

Because true peak occupancy might occur beyond the examined window.

The heuristic check:

horizon >= (sum_w_max / w_max) + (sum_r_max / r_max)

warns when the user-specified horizon may miss the true worst case.

Why are witness traces sometimes non-intuitive?¶

Because the solver:

Builds deterministic worst-case valid masks
Then schedules writes/reads adversarially within those masks
Maximizes occupancy

This leads to witness patterns that are mathematically valid but may not always follow expected patterns. The witness ensures correctness for the computed FIFO depth.

Appendix¶

Key Classes¶

Class	Defined In	Role
`FifoSolver`	`fifo_depth_base.py`	Abstract base class for all solvers
`FifoParams`	`fifo_depth_base.py`	Parameter model validated from YAML
`FifoResults`	`fifo_depth_base.py`	Container for solver results
`CdcSolver`	`fifo_depth_cdc.py`	CDC partition computation
`CbfcSolver`	`fifo_depth_cbfc.py`	Credit-based FIFO solver

References¶

Google OR-Tools CP-SAT Solver

Literature¶

[1] W. Dally and B. Towles, Principles and Practices of Interconnection Networks. San Francisco, CA: Morgan Kaufmann, 2004.

[2] A. DeJans Jr, The MILP Optimization Handbook: An Introduction to Linear and Integer Programming for Practitioners. Bit Bros LLC, 2025.

[3] C. Soviani and S. A. Edwards, "FIFO sizing for high-performance pipelines," in Proc. 16th Int. Workshop Logic Synth., San Diego, CA, 2007.

[4] R. Lu and C. K. Koh, "Performance optimization of latency insensitive systems through buffer queue sizing of communications channels," in Proc. Int. Conf. Comput. Aided Design, San Jose, CA, 2003.

[5] "Calculation of FIFO depth - made easy." Accessed: Nov. 23, 2025. [Online]. Available: https://hardwaregeeksblog.wordpress.com/wp-content/uploads/2016/12/fifodepthcalculationmadeeasy2.pdf

[6] Chipressian, Crack the Hardware Interview - from RTL Designers' Perspective: Architecture and Micro-architecture. Middletown, DE: Chipressian Publishing, 2025.

Licensing¶

See the LICENSES directory at the repository root.

FIFO Depth Tool (fifo-depth)¶