Proceedings of the ACM on Programming Languages: Vol. 8, No. PLDI. 2024

Full Citation in the ACM Digital Library


Input-Relational Verification of Deep Neural Networks

We consider the verification of input-relational properties defined over deep neural networks (DNNs) such as robustness against universal adversarial perturbations, monotonicity, etc. Precise verification of these properties requires reasoning about multiple executions of the same DNN. We introduce a novel concept of difference tracking to compute the difference between the outputs of two executions of the same DNN at all layers. We design a new abstract domain, DiffPoly for efficient difference tracking that can scale large DNNs. DiffPoly is equipped with custom abstract transformers for common activation functions (ReLU, Tanh, Sigmoid, etc.) and affine layers and can create precise linear cross-execution constraints. We implement an input-relational verifier for DNNs called RaVeN which uses DiffPoly and linear program formulations to handle a wide range of input-relational properties. Our experimental results on challenging benchmarks show that by leveraging precise linear constraints defined over multiple executions of the DNN, RaVeN gains substantial precision over baselines on a wide range of datasets, networks, and input-relational properties.

Modular Hardware Design of Pipelined Circuits with Hazards

Modular design is critical in reducing hardware designer's cognitive load and development cost. However, it is challenging to modularize high-performance pipelined circuits with structural, data, and control hazards because their resolution---stalling, and bypassing, and discard-and-restarting---introduce cross-stage dependencies. The dependencies could potentially mandate monolithic control logic and create combinational loops, hindering modular design. An effective method to modularize pipelined circuits is valid-ready interfaces, but they apply to a relatively simple form of pipelined circuits only with structural hazards. We propose hazard interfaces, a generalization of valid-ready interfaces that can modularize pipelined circuits not only with structural but also with data and control hazards. The key idea is enveloping the cross-stage dependencies within interfaces. We also design combinators for hazard interfaces in the style of map-reduce that facilitate decomposition of control logic. We implement a compiler (to synthesizable Verilog) for a prototype language supporting hazard interfaces and combinators, and design a sound and efficient type checker that proves the absence of combinational loops. With case studies on 5-stage RISC-V CPU core and 100 Gbps Ethernet NIC, we demonstrate that hazard interfaces indeed facilitate modular design while incurring no noticeable cost in performance, power, and area over reference designs in Chisel and Verilog.

Verified Extraction from Coq to OCaml

One of the central claims of fame of the Coq proof assistant is extraction, i.e. the ability to obtain efficient programs in industrial programming languages such as OCaml, Haskell, or Scheme from programs written in Coq’s expressive dependent type theory. Extraction is of great practical usefulness, used crucially e.g. in the CompCert project. However, for such executables obtained by extraction, the extraction process is part of the trusted code base (TCB), as are Coq’s kernel and the compiler used to compile the extracted code. The extraction process contains intricate semantic transformation of programs that rely on subtle operational features of both the source and target language. Its code has also evolved since the last theoretical exposition in the seminal PhD thesis of Pierre Letouzey. Furthermore, while the exact correctness statements for the execution of extracted code are described clearly in academic literature, the interoperability with unverified code has never been investigated formally, and yet is used in virtually every project relying on extraction. In this paper, we describe the development of a novel extraction pipeline from Coq to OCaml, implemented and verified in Coq itself, with a clear correctness theorem and guarantees for safe interoperability. We build our work on the MetaCoq project, which aims at decreasing the TCB of Coq’s kernel by re-implementing it in Coq itself and proving it correct w.r.t. a formal specification of Coq’s type theory in Coq. Since OCaml does not have a formal specification, we make use of the project specifying the semantics of the intermediate language of the OCaml compiler. Our work fills some gaps in the literature and highlights important differences between the operational semantics of Coq programs and their extraction. In particular, we focus on the guarantees that can be provided for interoperability with unverified code, and prove that extracted programs of first-order data type are correct and can safely interoperate, whereas for higher-order programs already simple interoperations can lead to incorrect behaviour and even outright segfaults.

Robust Resource Bounds with Static Analysis and Bayesian Inference

There are two approaches to automatically deriving symbolic worst-case resource bounds for programs: static analysis of the source code and data-driven analysis of cost measurements obtained by running the program. Static resource analysis is usually sound but incomplete. Data-driven analysis can always return a result, but its lack of robustness often leads to unsound results. This paper presents the design, implementation, and empirical evaluation of hybrid resource bound analyses that tightly integrate static analysis and data-driven analysis. The static analysis part builds on automatic amortized resource analysis (AARA), a state-of-the-art type-based resource analysis method that performs cost bound inference using linear optimization. The data-driven part is rooted in novel Bayesian modeling and inference techniques that improve upon previous data-driven analysis methods by reporting an entire probability distribution over likely resource cost bounds. A key innovation is a new type inference system called Hybrid AARA that coherently integrates Bayesian inference into conventional AARA, combining the strengths of both approaches. Hybrid AARA is proven to be statistically sound under standard assumptions on the runtime cost data. An experimental evaluation on a challenging set of benchmarks shows that Hybrid AARA (i) effectively mitigates the incompleteness of purely static resource analysis; and (ii) is more accurate and robust than purely data-driven resource analysis.

Recursive Program Synthesis using Paramorphisms

We show that synthesizing recursive functional programs using a class of primitive recursive combinators is both simpler and solves more benchmarks from the literature than previously proposed approaches. Our method synthesizes paramorphisms, a class of programs that includes the most common recursive programming patterns on algebraic data types. The crux of our approach is to split the synthesis problem into two parts: a multi-hole template that fixes the recursive structure, and a search for non-recursive program fragments to fill the template holes.

A Tensor Compiler with Automatic Data Packing for Simple and Efficient Fully Homomorphic Encryption

Fully Homomorphic Encryption (FHE) enables computing on encrypted data, letting clients securely offload computation to untrusted servers. While enticing, FHE has two key challenges that limit its applicability: it has high performance overheads (10,000× over unencrypted computation) and it is extremely hard to program. Recent hardware accelerators and algorithmic improvements have reduced FHE’s overheads and enabled large applications to run under FHE. These large applications exacerbate FHE’s programmability challenges. Writing FHE programs directly is hard because FHE schemes expose a restrictive, low-level interface that prevents abstraction and composition. Specifically, FHE requires packing encrypted data into large vectors (tens of thousands of elements long), FHE provides limited operations on these vectors, and values have noise that grows with each operation, which creates unintuitive performance tradeoffs. As a result, translating large applications, like neural networks, into efficient FHE circuits takes substantial tedious work. We address FHE’s programmability challenges with the Fhelipe FHE compiler. Fhelipe exposes a simple, numpy-style tensor programming interface, and compiles high-level tensor programs into efficient FHE circuits. Fhelipe’s key contribution is automatic data packing, which chooses data layouts for tensors and packs them into ciphertexts to maximize performance. Our novel framework considers a wide range of layouts and optimizes them analytically. This lets compile large FHE programs efficiently, unlike prior FHE compilers, which either use inefficient layouts or do not scale beyond tiny programs. We evaluate on both a state-of-the-art FHE accelerator and a CPU. is the first compiler that matches or exceeds the performance of large hand-optimized FHE applications, like deep neural networks, and outperforms a state-of-the-art FHE compiler by gmean 18.5. At the same time, dramatically simplifies programming, reducing code size by 10–48.

Concurrent Immediate Reference Counting

Memory management for optimistic concurrency in unmanaged programming languages is challenging. Safe memory reclamation (SMR) algorithms help address this, but they are difficult to use correctly. Automatic reference counting provides a simpler interface, but it has been less efficient than SMR algorithms. Recently, there has been a push to apply the optimizations used in garbage collectors for managed languages to elide reference count updates from local references. Notably, Fast Reference Counter, OrcGC, and Concurrent Deferred Reference Counting use SMR algorithms to protect local references by deferring decrements or reclamation. While they show a significant performance improvement, their use of deferral may result in growing memory usage due to slow reclamation of linked structures, and suboptimal performance in update-heavy workloads.

We present Concurrent Immediate Reference Counting (CIRC), a new combination of SMR algorithms with reference counting. CIRC employs deferral like other modern methods, but it avoids their problems with novel algorithms for (1) immediately reclaiming linked structures recursively by tracking the reachability of each object, and (2) applying decrements immediately and deferring only the reclamation. Our experiments show that CIRC’s memory usage does not grow over time and is only slightly higher than the underlying SMR. Moreover, CIRC further narrows the performance gap between the underlying SMR, positioning it as a promising solution to safe automatic memory management for highly concurrent data structures in unmanaged languages.

A Proof Recipe for Linearizability in Relaxed Memory Separation Logic

Linearizability is the de facto standard for correctness of concurrent objects—it essentially says that all the object’s operations behave as if they were atomic. There have been a number of recent advances in developing increasingly strong linearizability specifications for relaxed memory consistency (RMC), but scalable proof methods for these specifications do not exist due to the challenges arising from out-of-order executions (requiring event reordering) and selected synchronization (requiring tracking of view transfers).

We propose a proof recipe for the linearizable history specifications by Dang et al. in the Iris-based iRC11 concurrent separation logic in Coq. Key to our proof recipe is the notion of object modification order (OMO), which generalizes the modification order of the C11 memory model to an object-local setting. Using OMO we minimize the conditions that need to be proved for event reordering. To enable proof reuse for concurrent libraries that are built on top of others, OMO provides the novel notion of a commit-with relation that connects the linearization points of the lower and upper libraries. Using our recipe, we verify the linearizability of the Michael–Scott queue, the elimination stack, and Folly’s MPMC queue in RMC for the first time; and verify stronger specifications of a spinlock and atomic reference counting in RMC than prior work.

Diffy: Data-Driven Bug Finding for Configurations

Configuration errors remain a major cause of system failures and service outages. One promising approach to identify configuration errors automatically is to learn common usage patterns (and anti-patterns) using data-driven methods. However, existing data-driven learning approaches analyze only simple configurations (e.g., those with no hierarchical structure), identify only simple types of issues (e.g., type errors), or require extensive domain-specific tuning. In this paper, we present Diffy, the first push-button configuration analyzer that detects likely bugs in structured configurations. From example configurations, Diffy learns a common template, with "holes" that capture their variation. It then applies unsupervised learning to identify anomalous template parameters as likely bugs. We evaluate Diffy on a large cloud provider's wide-area network, an operational 5G network testbed, and MySQL configurations, demonstrating its versatility, performance, and accuracy. During Diffy's development, it caught and prevented a bug in a configuration timer value that had previously caused an outage for the cloud provider.

Boosting Compiler Testing by Injecting Real-World Code

We introduce a novel approach for testing optimizing compilers with code from real-world applications. The main idea is to construct well-formed programs by fusing multiple code snippets from various real-world projects. The key insight is backed by the fact that the large volume of real-world code exercises rich syntactical and semantic language features, which current engineering-intensive approaches like random program generators are hard to fully support. To construct well-formed programs from real-world code, our approach works by (1) extracting real-world code at the granularity of function, (2) injecting function calls into seed programs, and (3) leveraging dynamic execution information to maintain the semantics and build complex data dependencies between injected functions and the seed program. With this idea, our approach complements the existing generators by boosting their expressiveness via fusing real-world code in a semantics-preserving way.

We implement our idea in a tool, Creal, to test C compilers. In a nine-month testing period, we have reported 132 bugs to GCC and LLVM, two of the most popular and well-tested C compilers. At the time of writing, 121 of them have been confirmed as unknown bugs, and 101 of them have been fixed. Most of these bugs were miscompilations, and many were recognized as long-latent and critical. Our evaluation results evidently demonstrate the significant advantage of using real-world code to stress-test compilers. We believe this idea will benefit the general compiler testing direction and will be directly applicable to other compilers.

SMT Theory Arbitrage: Approximating Unbounded Constraints using Bounded Theories

SMT solvers are foundational tools for reasoning about constraints in practical problems both within and outside program analysis. Faster SMT solving improves the performance of practical tools and expands the set of tractable problems. Existing approaches to improving solver performance either focus on general algorithms applied below the level of individual theories, or focus on optimizations within a single theory. Unbounded constraints in which the number of possible variable values is infinite, such as real numbers and integers, pose a particularly difficult challenge for solvers. Bounded constraints in which the set of possible values is finite, such as bitvectors and floating-point numbers, on the other hand, are decidable and have been the subject of extensive performance improvement efforts.

This paper introduces a theory arbitrage: we transform unbounded constraints, which are often expensive to solve, into bounded constraints, which are typically cheaper to solve. By converting unbounded problems into bounded ones, theory arbitrage takes advantage of better performance on bounded constraints and unlocks optimization techniques that only apply to bounded theories. The transformation is achieved by harnessing a novel abstract interpretation strategy to infer bounds. The bounded transformed constraint is then an underapproximation of the semantics of the unbounded original. We realize our method for the theories of integers and real numbers with a practical tool (STAUB). Our evaluation demonstrates that theory arbitrage alone can speed up individual constraints by orders of magnitude and achieve up to a 1.4× speedup on average across nonlinear integer benchmarks. Furthermore, it enables the use of the recent compiler optimization-based technique SLOT for unbounded SMT theories, unlocking a further speedup of up to 3×. Finally, we incorporate STAUB into a practical termination proving tool and observe an overall 9% improvement in performance.

Compilation of Qubit Circuits to Optimized Qutrit Circuits

Quantum computers are a revolutionary class of computational platforms that are capable of solving computationally hard problems. However, today’s quantum hardware is subject to noise and decoherence issues that together limit the scale and complexity of the quantum circuits that can be implemented. Recently, practitioners have developed qutrit-based quantum hardware platforms that compute over 0, 1, and 2 states, and have presented circuit depth reduction techniques using qutrits’ higher energy 2 states to temporarily store information. However, thus far, such quantum circuits that use higher order states for temporary storage need to be manually crafted by hardware designers. We present , an optimizing compiler for qutrit circuits that implement qubit computations. deploys a qutrit circuit decomposition algorithm and a rewrite engine to construct and optimize qutrit circuits. We evaluate against hand-optimized qutrit circuits and qubit circuits, and find delivers up to 65% depth improvement over manual qutrit implementations, and 43-75% depth improvement over qubit circuits. We also perform a fidelity analysis and find -optimized qutrit circuits deliver up to 8.9× higher fidelity circuits than their manually implemented counterparts.

Optimistic Stack Allocation and Dynamic Heapification for Managed Runtimes

The runtimes of managed object-oriented languages such as Java allocate objects on the heap, and rely on automatic garbage collection (GC) techniques for freeing up unused objects. Most such runtimes also consist of just-in-time (JIT) compilers that optimize memory access and GC times by employing escape analysis: an object that does not escape (outlive) its allocating method can be allocated on (and freed up with) the stack frame of the corresponding method. However, in order to minimize the time spent in JIT compilation, the scope of such useful analyses is quite limited, thereby restricting their precision significantly. On the contrary, even though it is feasible to perform precise program analyses statically, it is not possible to use their results in a managed runtime without a closed-world assumption. In this paper, we propose a static+dynamic scheme that allows one to harness the results of a precise static escape analysis for allocating objects on stack, while taking care of both soundness and efficiency concerns in the runtime. Our scheme comprises of three key ideas. First, using the results of a statically performed escape analysis, it performs optimistic stack allocation during JIT compilation. Second, it handles the challenges associated with features that may invalidate the optimism, using a novel idea of dynamic heapification. Third, it uses another novel notion of stack ordering, again supported by a static analysis, to reduce the overheads associated with the checks that determine the need for heapification. The static and the runtime components of our approach are implemented in the Soot optimization framework and in the tiered infrastructure of the Eclipse OpenJ9 VM, respectively. To evaluate the benefits, we compare our scheme with the existing escape analysis and find that it succeeds in allocating a much larger number of objects on the stack. Furthermore, the enhanced stack allocation leads to a significant reduction in the number of GC cycles and brings decent performance improvements, especially suited for constrained-memory environments.

A Verified Compiler for a Functional Tensor Language

Producing efficient array code is crucial in high-performance domains like image processing and machine learning. It requires the ability to control factors like compute intensity and locality by reordering computations into different stages and granularities with respect to where they are stored. However, traditional pure, functional tensor languages struggle to do so. In a previous publication, we introduced ATL as a pure, functional tensor language capable of systematically decoupling compute and storage order via a set of high-level combinators known as reshape operators. Reshape operators are a unique functional-programming construct since they manipulate storage location in the generated code by modifying the indices that appear on the left-hand sides of storage expressions. We present a formal correctness proof for an implementation of the compilation algorithm, marking the first verification of a lowering algorithm targeting imperative loop nests from a source functional language that enables separate control of compute and storage ordering. One of the core difficulties of this proof required properly formulating the complex invariants to ensure that these storage-index remappings were well-formed. Notably, this exercise revealed a soundness bug in the original published compilation algorithm regarding the truncation reshape operators. Our fix is a new type system that captures safety conditions that were previously implicit and enables us to prove compiler correctness for well-typed source programs. We evaluate this type system and compiler implementation on a range of common programs and optimizations, including but not limited to those previously studied to demonstrate performance comparable to established compilers like Halide.

IsoPredict: Dynamic Predictive Analysis for Detecting Unserializable Behaviors in Weakly Isolated Data Store Applications

Distributed data stores typically provide weak isolation levels, which are efficient but can lead to unserializable behaviors, which are hard for programmers to understand and often result in errors. This paper presents the first dynamic predictive analysis for data store applications under weak isolation levels, called IsoPredict. Given an observed serializable execution of a data store application, IsoPredict generates and solves SMT constraints to find an unserializable execution that is a feasible execution of the application. IsoPredict introduces novel techniques to handle divergent application behavior; to solve mutually recursive sets of constraints; and to balance coverage, precision, and performance. An evaluation shows IsoPredict finds unserializable behaviors in four data store benchmarks, and that more than 99% of its predicted executions are feasible.

Compiling with Abstract Interpretation

Rewriting and static analyses are mutually beneficial techniques: program transformations change the intensional aspects of the program, and can thus improve analysis precision, while some efficient transformations are enabled by specific knowledge of some program invariants. Despite the strong interaction between these techniques, they are usually considered distinct. In this paper, we demonstrate that we can turn abstract interpreters into compilers, using a simple free algebra over the standard signature of abstract domains. Functor domains correspond to compiler passes, for which soundness is translated to a proof of forward simulation, and completeness to backward simulation. We achieve translation to SSA using an abstract domain with a non-standard SSA signature. Incorporating such an SSA translation to an abstract interpreter improves its precision; in particular we show that an SSA-based non-relational domain is always more precise than a standard non-relational domain for similar time and memory complexity. Moreover, such a domain allows recovering from precision losses that occur when analyzing low-level machine code instead of source code. These results help implement analyses or compilation passes where symbolic and semantic methods simultaneously refine each other, and improves precision when compared to doing the passes in sequence.

Associated Effects: Flexible Abstractions for Effectful Programming

We present associated effects, a programming language feature that enables type classes to abstract over the effects of their function signatures, allowing each type class instance to specify its concrete effects.

Associated effects significantly increase the flexibility and expressive power of a programming language that combines a type and effect system with type classes. In particular, associated effects allow us to (i) abstract over total and partial functions, where partial functions may throw exceptions, (ii) abstract over immutable data structures and mutable data structures that have heap effects, and (iii) implement adaptors that combine type classes with algebraic effects.

We implement associated effects as an extension of the Flix programming language and refactor the Flix Standard Library to use associated effects, significantly increasing its flexibility and expressive power. Specifically, we add associated effects to 11 type classes, which enables us to add 28 new type class instances.

Efficient Static Vulnerability Analysis for JavaScript with Multiversion Dependency Graphs

While static analysis tools that rely on Code Property Graphs (CPGs) to detect security vulnerabilities have proven effective, deciding how much information to include in the graphs remains a challenge. Including less information can lead to a more scalable analysis but at the cost of reduced effectiveness in identifying vulnerability patterns, potentially resulting in classification errors. Conversely, more information in the graph allows for a more effective analysis but may affect scalability. For example, scalability issues have been recently highlighted in ODGen, the state-of-the-art CPG-based tool for detecting Node.js vulnerabilities.

This paper examines a new point in the design space of CPGs for JavaScript vulnerability detection. We introduce the Multiversion Dependency Graph (MDG), a novel graph-based data structure that captures the state evolution of objects and their properties during program execution. Compared to the graphs used by ODGen, MDGs are significantly simpler without losing key information needed for vulnerability detection. We implemented Graph.js, a new MDG-based static vulnerability scanner specialized in analyzing npm packages and detecting taint-style and prototype pollution vulnerabilities. Our evaluation shows that Graph.js outperforms ODGen by significantly reducing both the false negatives and the analysis time. Additionally, we have identified 49 previously undiscovered vulnerabilities in npm packages.

Floating-Point TVPI Abstract Domain

Floating-point arithmetic is natively supported in hardware and the preferred choice when implementing numerical software in scientific or engineering applications. However, such programs are notoriously hard to analyze due to round-off errors and the frequent use of elementary functions such as log, arctan, or sqrt. In this work, we present the Two Variables per Inequality Floating-Point (TVPI-FP) domain, a numerical and constraint-based abstract domain designed for the analysis of floating-point programs. TVPI-FP supports all features of real-world floating-point programs including conditional branches, loops, and elementary functions and it is efficient asymptotically and in practice. Thus it overcomes limitations of prior tools that often are restricted to straight-line programs or require the use of expensive solvers. The key idea is the consistent use of interval arithmetic in inequalities and an associated redesign of all operators. Our extensive experiments show that TVPI-FP is often orders of magnitudes faster than more expressive tools at competitive, or better precision while also providing broader support for realistic programs with loops and conditionals.

NetBlocks: Staging Layouts for High-Performance Custom Host Network Stacks

Modern network applications and environments, ranging from data centers and IoT devices to AR/VR headsets and underwater robotics, present diverse requirements that cannot be satisfied by the all or-nothing approach of TCP and UDP protocols. Network researchers and engineers need to create highly tailored protocols targeting individual problem domains. Existing library-based approaches either fall short on the flexibility in features or offer them at a significant performance overhead. To address this challenge, we present NetBlocks, a domain-specific language, and compiler for designing ad-hoc protocols and generating their highly optimized host network stack implementations. NetBlocks DSL input allows users to configure protocols by selecting and customizing features. Unlike other DSL compilers, NetBlocks also allows network researchers to extend the system and add more features easily without any prior compiler knowledge. Our design and implementation employ a high-performance Aspect-Oriented Programming framework written with the staging framework BuildIt. We also introduce a novel Layout Customization Layer that allows "staging packet layouts" alongside the implementation, which is critical for getting the best performance out of the protocol when possible, while allowing the practitioners to maintain compatibility with existing protocol layers where needed. Our evaluations on three applications ranging across deployments in data centers and underwater acoustic networks demonstrate a trade-off between performance (both latency and throughput) and selected features allowing the user to only pay-for what-they-use.

The T-Complexity Costs of Error Correction for Control Flow in Quantum Computation

Numerous quantum algorithms require the use of quantum error correction to overcome the intrinsic unreliability of physical qubits. However, quantum error correction imposes a unique performance bottleneck, known as T-complexity, that can make an implementation of an algorithm as a quantum program run more slowly than on idealized hardware. In this work, we identify that programming abstractions for control flow, such as the quantum if-statement, can introduce polynomial increases in the T-complexity of a program. If not mitigated, this slowdown can diminish the computational advantage of a quantum algorithm.

To enable reasoning about the costs of control flow, we present a cost model that a developer can use to accurately analyze the T-complexity of a program under quantum error correction and pinpoint the sources of slowdown. To enable the mitigation of these costs, we present a set of program-level optimizations that a developer can use to rewrite a program to reduce its T-complexity, predict the T-complexity of the optimized program using the cost model, and then compile it to an efficient circuit via a straightforward strategy.

We implement the program-level optimizations in Spire, an extension of the Tower quantum compiler. Using a set of 11 benchmark programs that use control flow, we empirically show that the cost model is accurate, and that Spire’s optimizations recover programs that are asymptotically efficient, meaning their runtime T-complexity under error correction is equal to their time complexity on idealized hardware.

Our results show that optimizing a program before it is compiled to a circuit can yield better results than compiling the program to an inefficient circuit and then invoking a quantum circuit optimizer found in prior work. For our benchmarks, only 2 of 8 tested quantum circuit optimizers recover circuits with asymptotically efficient T-complexity. Compared to these 2 optimizers, Spire uses 54×–2400× less compile time.

The Functional Essence of Imperative Binary Search Trees

Algorithms on restructuring binary search trees are typically presented in imperative pseudocode. Understandably so, as their performance relies on in-place execution, rather than the repeated allocation of fresh nodes in memory. Unfortunately, these imperative algorithms are notoriously difficult to verify as their loop invariants must relate the unfinished tree fragments being rebalanced. This paper presents several novel functional algorithms for accessing and inserting elements in a restructuring binary search tree that are as fast as their imperative counterparts; yet the correctness of these functional algorithms is established using a simple inductive argument. For each data structure, move-to-root, splay, and zip trees, this paper describes both a bottom-up algorithm using zippers and a top-down algorithm using a novel first-class constructor context primitive. The functional and imperative algorithms are equivalent: we mechanise the proofs establishing this in the Coq proof assistant using the Iris framework. This yields a first fully verified implementation of well known algorithms on binary search trees with performance on par with the fastest implementations in C.

Compositional Semantics for Shared-Variable Concurrency

We revisit the fundamental problem of defining a compositional semantics for a concurrent programming language under sequentially consistent memory with the aim of equating the denotations of pieces of code if and only if these pieces induce the same behavior under all program contexts. While the denotational semantics presented by Brookes [Information and Computation 127, 2 (1996)] has been considered a definitive solution, we observe that Brookes's full abstraction result crucially relies on the availability of an impractical whole-memory atomic read-modify-write instruction. In contrast, we consider a language with standard primitives, which apply to a single variable. For that language, we propose an alternative denotational semantics based on traces that track program write actions together with the writes expected from the environment, and equipped with several closure operators to achieve necessary abstraction. We establish the adequacy of the semantics, and demonstrate full abstraction for the case that the analyzed code segment is loop-free. Furthermore, we show that by including a whole-memory atomic read in the language, one obtains full abstraction for programs with loops. To gain confidence, our results are fully mechanized in Coq.

Falcon: A Fused Approach to Path-Sensitive Sparse Data Dependence Analysis

This paper presents a scalable path- and context-sensitive data dependence analysis. The key is to address the aliasing-path-explosion problem when enforcing a path-sensitive memory model. Specifically, our approach decomposes the computational efforts of disjunctive reasoning into 1) a context- and semi-path-sensitive analysis that concisely summarizes data dependence as the symbolic and storeless value-flow graphs, and 2) a demand-driven phase that resolves transitive data dependence over the graphs, piggybacking the computation of fully path-sensitive pointer information with the resolution of data dependence of interest. We have applied the approach to two clients, namely thin slicing and value-flow bug finding. Using a suite of 16 C/C++ programs ranging from 13 KLoC to 8 MLoC, we compare our techniques against a diverse group of state-of-the-art analyses, illustrating the significant precision and scalability advantages of our approach.

Allo: A Programming Model for Composable Accelerator Design

Special-purpose hardware accelerators are increasingly pivotal for sustaining performance improvements in emerging applications, especially as the benefits of technology scaling continue to diminish. However, designers currently lack effective tools and methodologies to construct complex, high-performance accelerator architectures in a productive manner. Existing high-level synthesis (HLS) tools often require intrusive source-level changes to attain satisfactory quality of results. Despite the introduction of several new accelerator design languages (ADLs) aiming to enhance or replace HLS, their advantages are more evident in relatively simple applications with a single kernel. Existing ADLs prove less effective for realistic hierarchical designs with multiple kernels, even if the design hierarchy is flattened. In this paper, we introduce Allo, a composable programming model for efficient spatial accelerator design. Allo decouples hardware customizations, including compute, memory, communication, and data type from algorithm specification, and encapsulates them as a set of customization primitives. Allo preserves the hierarchical structure of an input program by combining customizations from different functions in a bottom-up, type-safe manner. This approach facilitates holistic optimizations that span across function boundaries. We conduct comprehensive experiments on commonly-used HLS benchmarks and several realistic deep learning models. Our evaluation shows that Allo can outperform state-of-the-art HLS tools and ADLs on all test cases in the PolyBench. For the GPT2 model, the inference latency of the Allo generated accelerator is 1.7x faster than the NVIDIA A100 GPU with 5.4x higher energy efficiency, demonstrating the capability of Allo to handle large-scale designs.

VESTA: Power Modeling with Language Runtime Events

Power modeling is an essential building block for computer systems in support of energy optimization, energy profiling, and energy-aware application development. We introduce VESTA, a novel approach to modeling the power consumption of applications with one key insight: language runtime events are often correlated with a sustained level of power consumption. When compared with the established approach of power modeling based on hardware performance counters (HPCs), VESTA has the benefit of solely requiring application-scoped information and enabling a higher level of explainability, while achieving comparable or even higher precision. Through experiments performed on 37 real-world applications on the Java Virtual Machine (JVM), we find the power model built by VESTA is capable of predicting energy consumption with a mean absolute percentage error of 1.56%, while the monitoring of language runtime events incurs small performance and energy overhead.

Mechanised Hypersafety Proofs about Structured Data

Arrays are a fundamental abstraction to represent collections of data. It is often possible to exploit structural properties of the data stored in an array (e.g., repetition or sparsity) to develop a specialised representation optimised for space efficiency. Formally reasoning about correctness of manipulations with such structured data is challenging, as they are often composed of multiple loops with non-trivial invariants. In this work, we observe that specifications for structured data manipulations can be phrased as hypersafety properties, i.e., predicates that relate traces of k programs. To turn this observation into an effective verification methodology, we developed the Logic for Graceful Tensor Manipulation (LGTM), a new Hoare-style relational separation logic for specifying and verifying computations over structured data. The key enabling idea of LGTM is that of parametrised hypersafety specifications that allow the number k of the program components to depend on the program variables. We implemented LGTM as a foundational embedding into Coq, mechanising its rules, meta-theory, and the proof of soundness. Furthermore, we developed a library of domain-specific tactics that automate computer-aided hypersafety reasoning, resulting in pleasantly short proof scripts that enjoy a high degree of reuse. We argue for the effectiveness of relational reasoning about structured data in LGTM by specifying and mechanically proving correctness of 13 case studies including computations on compressed arrays and efficient operations over multiple kinds of sparse tensors.

Refined Input, Degraded Output: The Counterintuitive World of Compiler Behavior

To optimize a program, a compiler needs precise information about it. Significant effort is dedicated to improving the ability of compilers to analyze programs, with the expectation that more information results in better optimization. But this assumption does not always hold: due to unexpected interactions between compiler components and phase ordering issues, sometimes more information leads to worse optimization. This can lead to wasted research and engineering effort whenever compilers cannot efficiently leverage additional information. In this work, we systematically examine the extent to which additional information can be detrimental to compilers. We consider two types of information: dead code, i.e., whether a program location is unreachable, and value ranges, i.e., the possible values a variable can take at a specific program location. Given a seed program, we refine it with additional information and check whether this degrades the output. Based on this approach, we develop a fully automated and effective testing method for identifying such issues, and through an extensive evaluation and analysis, we quantify their existence and prevalence in widely used compilers. In particular, we have reported 59 cases in GCC and LLVM, of which 55 have been confirmed or fixed so far, highlighting the practical relevance and value of our findings. This work’s fresh perspective opens up a new direction in understanding and improving compilers.

Jacdac: Service-Based Prototyping of Embedded Systems

The traditional approach to programming embedded systems is monolithic: firmware on a microcontroller contains both application code and the drivers needed to communicate with sensors and actuators, using low-level protocols such as I2C, SPI, and RS232. In comparison, software development for the cloud has moved to a service-based development and operation paradigm: a service provides a discrete unit of functionality that can be accessed remotely by an application, or other service, but is independently managed and updated. We propose, design, implement, and evaluate a service-based approach to prototyping embedded systems called Jacdac. Jacdac defines a service specification language, designed especially for embedded systems, along with a host of specifications for a variety of sensors and actuators. With Jacdac, each sensor/actuator in a system is paired with a low-cost microcontroller that advertises the services that represent the functionality of the underlying hardware over an efficient and low-cost single-wire bus protocol. A separate microcontroller executes the user's application program, which is a client of the Jacdac services on the bus. Our evaluation shows that Jacdac supports a service-based abstraction for sensors/actuators at low cost and reasonable performance, with many benefits for prototyping: ease of use via the automated discovery of devices and their capabilities, substitution of same-service devices for each other, as well as high-level programming, monitoring, and debugging. We also report on the experience of bringing Jacdac to commercial availability via third-party manufacturers.

Don’t Write, but Return: Replacing Output Parameters with Algebraic Data Types in C-to-Rust Translation

Translating legacy system programs from C to Rust is a promising way to enhance their reliability. To alleviate the burden of manual translation, automatic C-to-Rust translation is desirable. However, existing translators fail to generate Rust code fully utilizing Rust’s language features, including algebraic data types. In this work, we focus on tuples and Option/Result types, an important subset of algebraic data types. They are used as functions’ return types to represent those returning multiple values and those that may fail. Due to the absence of these types, C programs use output parameters, i.e., pointer-type parameters for producing outputs, to implement such functions. As output parameters make code less readable and more error-prone, their use is discouraged in Rust. To address this problem, this paper presents a technique for removing output parameters during C-to-Rust translation. This involves three steps: (1) syntactically translating C code to Rust using an existing translator; (2) analyzing the Rust code to extract information related to output parameters; and (3) transforming the Rust code using the analysis result. The second step poses several challenges, including the identification and classification of output parameters. To overcome these challenges, we propose a static analysis based on abstract interpretation, complemented by the notion of abstract read/write sets, which approximate the sets of read/written pointers, and two sensitivities: write set sensitivity and nullity sensitivity. Our evaluation shows that the proposed technique is (1) scalable, with the analysis and transformation of 190k LOC within 213 seconds, (2) useful, with the detection of 1,670 output parameters across 55 real-world C programs, and (3) mostly correct, with 25 out of 26 programs passing their test suites after the transformation.

Quantitative Robustness for Vulnerability Assessment

Most software analysis techniques focus on bug reachability. However, this approach is not ideal for security evaluation as it does not take into account the difficulty of triggering said bugs. The recently introduced notion of robust reachability tackles this issue by distinguishing between bugs that can be reached independently from uncontrolled inputs, from those that cannot. Yet, this qualitative notion is too strong in practice as it cannot distinguish mostly replicable bugs from truly unrealistic ones. In this work we propose a more flexible quantitative version of robust reachability together with a dedicated form of symbolic execution, in order to automatically measure the difficulty of triggering bugs. This quantitative robust symbolic execution (QRSE) relies on a variant of model counting, called functional E-MAJSAT, which allows to account for the asymmetry between attacker-controlled and uncontrolled variables. While this specific model counting problem has been studied in AI research fields such as Bayesian networks, knowledge representation and probabilistic planning, its use within the context of formal verification presents a new set of challenges. We show the applicability of our solutions through security-oriented case studies, including real-world vulnerabilities such as CVE-2019-20839 from libvncserver.

Automated Verification of Fundamental Algebraic Laws

Algebraic laws of functions in mathematics – such as commutativity, associativity, and idempotence – are often used as the basis to derive more sophisticated properties of complex mathematical structures and are heavily used in abstract computational thinking. Algebraic laws of functions in coding, however, are rarely considered. Yet, they are essential. For example, commutativity and associativity are crucial to ensure correctness of a variety of software systems in numerous domains, such as compiler optimization, big data processing, data flow processing, machine learning or distributed algorithms and data structures. Still, most programming languages lack built-in mechanisms to enforce and verify that operations adhere to such properties.

In this paper, we propose a verifier specialized on a set of fundamental algebraic laws that ensures that such laws hold in application code. The verifier can conjecture auxiliary properties and can reason about both equalities and inequalities of expressions, which is crucial to prove a given property when other competitors do not succeed. We implement these ideas in the Propel verifier. Our evaluation against five state-of-the-art verifiers on a total of 142 instances of algebraic properties shows that Propel is able to automatically deduce algebraic properties in different domains that rely on such properties for correctness, even in cases where competitors fail to verify the same properties or time out.

GenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tables

This article presents GenSQL, a probabilistic programming system for querying probabilistic generative models of database tables. By augmenting SQL with only a few key primitives for querying probabilistic models, GenSQL enables complex Bayesian inference workflows to be concisely implemented. GenSQL’s query planner rests on a unified programmatic interface for interacting with probabilistic models of tabular data, which makes it possible to use models written in a variety of probabilistic programming languages that are tailored to specific workflows. Probabilistic models may be automatically learned via probabilistic program synthesis, hand-designed, or a combination of both. GenSQL is formalized using a novel type system and denotational semantics, which together enable us to establish proofs that precisely characterize its soundness guarantees. We evaluate our system on two case real-world studies—an anomaly detection in clinical trials and conditional synthetic data generation for a virtual wet lab—and show that GenSQL more accurately captures the complexity of the data as compared to common baselines. We also show that the declarative syntax in GenSQL is more concise and less error-prone as compared to several alternatives. Finally, GenSQL delivers a 1.7-6.8x speedup compared to its closest competitor on a representative benchmark set and runs in comparable time to hand-written code, in part due to its reusable optimizations and code specialization.

Daedalus: Safer Document Parsing

Despite decades of contributions to the theoretical foundations of parsing and the many tools available to aid in parser development, many security attacks in the wild still exploit parsers. The issues are myriad—flaws in memory management in contexts lacking memory safety, flaws in syntactic or semantic validation of input, and misinterpretation of hundred-page-plus standards documents. It remains challenging to build and maintain parsers for common, mature data formats. In response to these challenges, we present Daedalus, a new domain-specific language (DSL) and toolchain for writing safe parsers. Daedalus is built around functional-style parser combinators, which suit the rich data dependencies often found in complex data formats. It adds domain-specific constructs for stream manipulation, allowing the natural expression of parsing noncontiguous formats. Balancing between expressivity and domain-specific constructs lends Daedalus specifications simplicity and leaves them amenable to analysis. As a stand-alone DSL, Daedalus is able to generate safe parsers in multiple languages, currently C++ and Haskell. We have implemented 20 data formats with Daedalus, including two large, complex formats—PDF and NITF—and our evaluation shows that Daedalus parsers are concise and performant. Our experience with PDF forms our largest case study. We worked with the PDF Association to build a reference implementation, which was subject to a red-teaming exercise along with a number of other PDF parsers and was the only parser to be found free of defects.

Descend: A Safe GPU Systems Programming Language

Graphics Processing Units (GPU) offer tremendous computational power by following a throughput oriented paradigm where many thousand computational units operate in parallel. Programming such massively parallel hardware is challenging. Programmers must correctly and efficiently coordinate thousands of threads and their accesses to various shared memory spaces. Existing mainstream GPU programming languages, such as CUDA and OpenCL, are based on C/C++ inheriting their fundamentally unsafe ways to access memory via raw pointers. This facilitates easy to make, but hard to detect bugs, such as data races and deadlocks.

In this paper, we present Descend: a safe GPU programming language. In contrast to prior safe high-level GPU programming approaches, Descend is an imperative GPU systems programming language in the spirit of Rust, enforcing safe CPU and GPU memory management in the type system by tracking Ownership and Lifetimes. Descend introduces a new holistic GPU programming model where computations are hierarchically scheduled over the GPU’s execution resources: grid, blocks, warps, and threads. Descend’s extended Borrow checking ensures that execution resources safely access memory regions without data races. For this, we introduced views describing safe parallel access patterns of memory regions, as well as atomic variables. For memory accesses that can’t be checked by our type system, users can annotate limited code sections as unsafe.

We discuss the memory safety guarantees offered by Descend and evaluate our implementation using multiple benchmarks, demonstrating that Descend is capable of expressing real-world GPU programs showing competitive performance compared to manually written CUDA programs lacking Descend’s safety guarantees.

Bit Blasting Probabilistic Programs

Probabilistic programming languages (PPLs) are an expressive means for creating and reasoning about probabilistic models. Unfortunately hybrid probabilistic programs that involve both continuous and discrete structures are not well supported by today’s PPLs. In this paper we develop a new approximate inference algorithm for hybrid probabilistic programs that first discretizes the continuous distributions and then performs discrete inference on the resulting program. The key novelty is a form of discretization that we call bit blasting, which uses a binary representation of numbers such that a domain of 2b discretized points can be succinctly represented as a discrete probabilistic program over poly(b) Boolean random variables. Surprisingly, we prove that many common continuous distributions can be bit blasted in a manner that incurs no loss of accuracy over an explicit discretization and supports efficient probabilistic inference. We have built a probabilistic programming system for hybrid programs called HyBit, which employs bit blasting followed by discrete probabilistic inference. We empirically demonstrate the benefits of our approach over existing sampling-based and symbolic inference approaches.

Quiver: Guided Abductive Inference of Separation Logic Specifications in Coq

Over the past two decades, there has been a great deal of progress on verification of full functional correctness of programs using separation logic, sometimes even producing “foundational” proofs in proof assistants like Coq. Unfortunately, even though existing approaches to this problem provide significant support for automated verification, they still incur a significant specification overhead: the user must supply the specification against which the program is verified, and the specification may be long, complex, or tedious to formulate. In this paper, we introduce Quiver, the first technique for inferring functional correctness specifications in separation logic while simultaneously verifying foundationally that they are correct. To guide Quiver towards the final specification, we take hints from the user in the form of a specification sketch, and then complete the sketch using inference. To do so, Quiver introduces a new abductive deductive verification technique, which integrates ideas from abductive inference (for specification inference) together with deductive separation logic automation (for foundational verification). The result is that users have to provide some guidance, but significantly less than with traditional deductive verification techniques based on separation logic. We have evaluated Quiver on a range of case studies, including code from popular open-source libraries.

Program Analysis for Adaptive Data Analysis

Data analyses are usually designed to identify some property of the population from which the data are drawn, generalizing beyond the specific data sample. For this reason, data analyses are often designed in a way that guarantees that they produce a low generalization error. That is, they are designed so that the result of a data analysis run on a sample data does not differ too much from the result one would achieve by running the analysis over the entire population. An adaptive data analysis can be seen as a process composed by multiple queries interrogating some data, where the choice of which query to run next may rely on the results of previous queries. The generalization error of each individual query/analysis can be controlled by using an array of well-established statistical techniques. However, when queries are arbitrarily composed, the different errors can propagate through the chain of different queries and bring to a high generalization error. To address this issue, data analysts are designing several techniques that not only guarantee bounds on the generalization errors of single queries, but that also guarantee bounds on the generalization error of the composed analyses. The choice of which of these techniques to use, often depends on the chain of queries that an adaptive data analysis can generate. In this work, we consider adaptive data analyses implemented as while-like programs and we design a program analysis which can help with identifying which technique to use to control their generalization errors. More specifically, we formalize the intuitive notion of adaptivity as a quantitative property of programs. We do this because the adaptivity level of a data analysis is a key measure to choose the right technique. Based on this definition, we design a program analysis for soundly approximating this quantity. The program analysis generates a representation of the data analysis as a weighted dependency graph, where the weight is an upper bound on the number of times each variable can be reached, and uses a path search strategy to guarantee an upper bound on the adaptivity. We implement our program analysis and show that it can help to analyze the adaptivity of several concrete data analyses with different adaptivity structures.

Superfusion: Eliminating Intermediate Data Structures via Inductive Synthesis

Intermediate data structures are a common cause of inefficiency in functional programming. Fusion attempts to eliminate intermediate data structures by combining adjacent data traversals into one; existing fusion techniques, however, are based on predefined rewrite rules and hence are limited in expressiveness.

In this work we explore a different approach to eliminating intermediate data structures, based on inductive program synthesis. We dub this approach superfusion (by analogy with superoptimization, which uses inductive synthesis for program optimization). Starting from a reference program annotated with data structures to be eliminated, superfusion first generates a sketch where program fragments operating on those data structures are replaced with holes; it then fills the holes with constant-time expressions such that the resulting program is equivalent to the reference. The main technical challenge here is scalability because optimized programs are often complex, making the search space intractably large for naive enumeration. To address this challenge, our key insight is to first synthesize a ghost function that describes the relationship between the original intermediate data structure and its compressed version; this function, although not used in the final program, serves to decompose the joint sketch filling problem into independent simpler problems for each hole.

We implement superfusion in a tool called SuFu and evaluate it on a dataset of 290 tasks collected from prior work on deductive fusion and program restructuring. The results show that SuFu solves 264 out of 290 tasks, exceeding the capabilities of rewriting-based fusion systems and achieving comparable performance with specialized approaches to program restructuring on their respective domains.

Consolidating Smart Contracts with Behavioral Contracts

Ensuring the reliability of smart contracts is of vital importance due to the wide adoption of smart contract programs in decentralized financial applications. However, statically checking many rich properties of smart contract programs can be challenging. On the other hand, dynamic validation approaches have shown promise for widespread adoption in practice. Nevertheless, as part of the programming environment for smart contracts, existing dynamic validation approaches have not provided programmers with a notion to clearly articulate the interface between components, especially for addresses representing opaque contract instances. We argue that the “design-by-contract” approach should complement the development of smart contract programs. Unfortunately, there is only limited linguistic support for that in existing smart contract languages. In this paper, we design a Solidity language extension ConSol that supports behavioral contracts. ConSol provides programmers with a modular specification and monitoring system for both functional and latent address behaviors. The key capability of ConSol is to attach specifications to first-class addresses and monitor violations when invoking these addresses. We evaluate ConSol using 20 real-world cases, demonstrating its effectiveness in expressing critical conditions and preventing attacks. Additionally, we assess ConSol’s efficiency and compare gas consumption with manually inserted assertions, showing that our approach introduces only marginal gas overhead. By separating specifications and implementations using behavioral contracts, ConSol assists programmers in writing smart contract code that is more robust and readable.

Scaling Type-Based Points-to Analysis with Saturation

Designing a whole-program static analysis requires trade-offs between precision and scalability. While a context-insensitive points-to analysis is often considered a good compromise, it still has non-linear complexity that leads to scalability problems when analyzing large applications. On the other hand, rapid type analysis scales well but lacks precision. We use saturation in a context-insensitive type-based points-to analysis to make it as scalable as a rapid type analysis, while preserving most of the precision of the points-to analysis. With saturation, the points-to analysis only propagates small points-to sets for variables. If a variable can have more values than a certain threshold, the variable and all its usages are considered saturated and no longer analyzed. Our implementation in the points-to analysis of GraalVM Native Image, a closed-world approach to build standalone binaries for Java applications, shows that saturation allows GraalVM Native Image to analyze large Java applications with hundreds of thousands of methods in less than two ‍minutes.

From Batch to Stream: Automatic Generation of Online Algorithms

Online streaming algorithms, tailored for continuous data processing, offer substantial benefits but are often more intricate to design than their offline counterparts. This paper introduces a novel approach for automatically synthesizing online streaming algorithms from their offline versions. In particular, we propose a novel methodology, based on the notion of relational function signature (RFS), for deriving an online algorithm given its offline version. Then, we propose a concrete synthesis algorithm that is an instantiation of the proposed methodology. Our algorithm uses the RFS to decompose the synthesis problem into a set of independent subtasks and uses a combination of symbolic reasoning and search to solve each subproblem. We implement the proposed technique in a new tool called Opera and evaluate it on over 50 tasks spanning two domains: statistical computations and online auctions. Our results show that Opera can automatically derive the online version of the original algorithm for 98% of the tasks. Our experiments also demonstrate that Opera significantly outperforms alternative approaches, including adaptations of SyGuS solvers to this problem as well as two of Opera's own ablations.

Symbolic Execution for Quantum Error Correction Programs

We define QSE, a symbolic execution framework for quantum programs by integrating symbolic variables into quantum states and the outcomes of quantum measurements. The soundness of QSE is established through a theorem that ensures the correctness of symbolic execution within operational semantics. We further introduce symbolic stabilizer states, which symbolize the phases of stabilizer generators, for the efficient analysis of quantum error correction (QEC) programs. Within the QSE framework, we can use symbolic expressions to characterize the possible discrete Pauli errors in QEC, providing a significant improvement over existing methods that rely on sampling with simulators. We implement QSE with the support of symbolic stabilizer states in a prototype tool named QuantumSE.jl. Our experiments on representative QEC codes, including quantum repetition codes, Kitaev's toric codes, and quantum Tanner codes, demonstrate the efficiency of QuantumSE.jl for debugging QEC programs with over 1000 qubits. In addition, by substituting concrete values in symbolic expressions of measurement results, QuantumSE.jl is also equipped with a sampling feature for stabilizer circuits. Despite a longer initialization time than the state-of-the-art stabilizer simulator, Google's Stim, QuantumSE.jl offers a quicker sampling rate in the experiments.

Wavefront Threading Enables Effective High-Level Synthesis

Digital systems are growing in importance and computing hardware is growing more heterogeneous. Hardware design, however, remains laborious and expensive, in part due to the limitations of conventional hardware description languages (HDLs) like VHDL and Verilog. A longstanding research goal has been programming hardware like software, with high-level languages that can generate efficient hardware designs. This paper describes Kanagawa, a language that takes a new approach to combine the programmer productivity benefits of traditional High-Level Synthesis (HLS) approaches with the expressibility and hardware efficiency of Register-Transfer Level (RTL) design. The language's concise syntax, matched with a hardware design-friendly execution model, permits a relatively simple toolchain to map high-level code into efficient hardware implementations.

Decidable Subtyping of Existential Types for Julia

Julia is a modern scientific-computing language that relies on multiple dispatch to implement generic libraries. While the language does not have a static type system, method declarations are decorated with expressive type annotations to determine when they are applicable. To find applicable methods, the implementation uses subtyping at run-time. We show that Julia's subtyping is undecidable, and we propose a restriction on types to recover decidability by stratifying types into method signatures over value types---where the former can freely use bounded existential types but the latter are restricted to use-site variance. A corpus analysis suggests that nearly all Julia programs written in practice already conform to this restriction.

RefinedRust: A Type System for High-Assurance Verification of Rust Programs

Rust is a modern systems programming language whose ownership-based type system statically guarantees memory safety, making it particularly well-suited to the domain of safety-critical systems. In recent years, a wellspring of automated deductive verification tools have emerged for establishing functional correctness of Rust code. However, none of the previous tools produce foundational proofs (machine-checkable in a general-purpose proof assistant), and all of them are restricted to the safe fragment of Rust. This is a problem because the vast majority of Rust programs make use of unsafe code at critical points, such as in the implementation of widely-used APIs. We propose RefinedRust, a refinement type system—proven sound in the Coq proof assistant—with the goal of establishing foundational semi-automated functional correctness verification of both safe and unsafe Rust code. We have developed a prototype verification tool implementing RefinedRust. Our tool translates Rust code (with user annotations) into a model of Rust embedded in Coq, and then checks its adherence to the RefinedRust type system using separation logic automation in Coq. All proofs generated by RefinedRust are checked by the Coq proof assistant, so the automation and type system do not have to be trusted. We evaluate the effectiveness of RefinedRust by verifying a variant of Rust’s Vec implementation that involves intricate reasoning about unsafe pointer-manipulating code.

LiDO: Linearizable Byzantine Distributed Objects with Refinement-Based Liveness Proofs

Byzantine fault-tolerant state machine replication (SMR) protocols, such as PBFT, HotStuff, and Jolteon, are essential for modern blockchain technologies. However, they are challenging to implement correctly because they have to deal with any unexpected message from byzantine peers and ensure safety and liveness at all times. Many formal frameworks have been developed to verify the safety of SMR implementations, but there is still a gap in the verification of their liveness. Existing liveness proofs are either limited to the network level or do not cover popular partially synchronous protocols.

We introduce LiDO, a consensus model that enables the verification of both safety and liveness of implementations through refinement. We observe that current consensus models cannot handle liveness because they do not include a pacemaker state. We show that by adding a pacemaker state to the LiDO model, we can express the liveness properties of SMR protocols as a few safety properties that can be easily verified by refinement proofs. Based on our LiDO model, we provide mechanized safety and liveness proofs for both unpipelined and pipelined Jolteon in Coq. This is the first mechanized liveness proof for a byzantine consensus protocol with non-trivial optimizations such as pipelining.

Reducing Static Analysis Unsoundness with Approximate Interpretation

Static program analysis for JavaScript is more difficult than for many other programming languages. One of the main reasons is the presence of dynamic property accesses that read and write object properties via dynamically computed property names. To ensure scalability and precision, existing state-of-the-art analyses for JavaScript mostly ignore these operations although it results in missed call edges and aliasing relations.

We present a novel dynamic analysis technique named approximate interpretation that is designed to efficiently and fully automatically infer likely determinate facts about dynamic property accesses, in particular those that occur in complex library API initialization code, and how to use the produced information in static analysis to recover much of the abstract information that is otherwise missed.

Our implementation of the technique and experiments on 141 real-world Node.js-based JavaScript applications and libraries show that the approach leads to significant improvements in call graph construction. On average the use of approximate interpretation leads to 55.1% more call edges, 21.8% more reachable functions, 17.7% more resolved call sites, and only 1.5% fewer monomorphic call sites. For 36 JavaScript projects where dynamic call graphs are available, average analysis recall is improved from 75.9% to 88.1% with a negligible reduction in precision.

Verification under Intel-x86 with Persistency

The full semantics of the Intel-x86 architecture has been defined by Raad et al in POPL 2022, extending the earlier formalization based on the TSO memory model incorporating persistency. This new semantics involves an intricate combination of the SC, TSO, and PSO models to account for the diverse features of the enlarged instruction set. In this paper we investigate the reachability problem under this semantics, including both its consistency and persistency aspects each of which requires reasoning about unbounded operation reorderings. Our first contribution is to show that reachability under this model can be reduced to reachability under a model without the persistency component. This is achieved by showing that the persistency semantics can be simulated by a finite-state protocol running in parallel with the program. Our second contribution is to prove that reachability under the consistency model of Intel-x86 (even without crashes and persistency) is undecidable. Undecidability is obtained as soon as one thread in the program is allowed to use both TSO variables and two PSO variables. The third contribution is showing that for any fixed bound on the alternation between TSO writes (write-backs), and PSO writes (non-temporal writes), the reachability problem is decidable. This defines a complete parametrized schema for under-approximate analysis that can be used for bug finding.

Compilation of Modular and General Sparse Workspaces

Recent years have seen considerable work on compiling sparse tensor algebra expressions. This paper addresses a shortcoming in that work, namely how to generate efficient code (in time and space) that scatters values into a sparse result tensor. We address this shortcoming through a compiler design that generates code that uses sparse intermediate tensors (sparse workspaces) as efficient adapters between compute code that scatters and result tensors that do not support random insertion. Our compiler automatically detects sparse scattering behavior in tensor expressions and inserts necessary intermediate workspace tensors. We present an algorithm template for workspace insertion that is the backbone of our code generation algorithm. Our algorithm template is modular by design, supporting sparse workspaces that span multiple user-defined implementations. Our evaluation shows that sparse workspaces can be up to 27.12× faster than the dense workspaces of prior work. On the other hand, dense workspaces can be up to 7.58× faster than the sparse workspaces generated by our compiler in other situations, which motivates our compiler design that supports both. Our compiler produces sequential code that is competitive with hand-optimized linear and tensor algebra libraries on the expressions they support, but that generalizes to any other expression. Sparse workspaces are also more memory efficient than dense workspaces as they compress away zeros. This compression can asymptotically decrease memory usage, enabling tensor computations on data that would otherwise run out of memory.

Maximum Consensus Floating Point Solutions for Infeasible Low-Dimensional Linear Programs with Convex Hull as the Intermediate Representation

This paper proposes a novel method to efficiently solve infeasible low-dimensional linear programs (LDLPs) with billions of constraints and a small number of unknown variables, where all the constraints cannot be satisfied simultaneously. We focus on infeasible linear programs generated in the RLibm project for creating correctly rounded math libraries. Specifically, we are interested in generating a floating point solution that satisfies the maximum number of constraints. None of the existing methods can solve such large linear programs while producing floating point solutions.

We observe that the convex hull can serve as an intermediate representation (IR) for solving infeasible LDLPs using the geometric duality between linear programs and convex hulls. Specifically, some of the constraints that correspond to points on the convex hull are precisely those constraints that make the linear program infeasible. Our key idea is to split the entire set of constraints into two subsets using the convex hull IR: (a) a set X of feasible constraints and (b) a superset V of infeasible constraints. Using the special structure of the RLibm constraints and the presence of a method to check whether a system is feasible or not, we identify a superset of infeasible constraints by computing the convex hull in 2-dimensions. Subsequently, we identify the key constraints (i.e., basis constraints) in the set of feasible constraints X and use them to create a new linear program whose solution identifies the maximum set of constraints satisfiable in V while satisfying all the constraints in X. This new solver enabled us to improve the performance of the resulting RLibm polynomials while solving the corresponding linear programs significantly faster.

Qubit Recycling Revisited

Reducing the width of quantum circuits is crucial due to limited number of qubits in quantum devices. This paper revisit an optimization strategy known as qubit recycling (alternatively wire-recycling or measurement-and-reset), which leverages gate commutativity to reuse discarded qubits, thereby reducing circuit width. We introduce qubit dependency graphs (QDGs) as a key abstraction for this optimization. With QDG, we isolate the computationally demanding components, and observe that qubit recycling is essentially a matrix triangularization problem. Based on QDG and this observation, we study qubit recycling with a focus on complexity, algorithmic, and verification aspects. Firstly, we establish qubit recycling’s NP-hardness through reduction from Wilf’s question, another matrix triangularization problem. Secondly, we propose a QDG-guided solver featuring multiple heuristic options for effective qubit recycling. Benchmark tests conducted on RevLib illustrate our solver’s superior or comparable performance to existing alternatives. Notably, it achieves optimal solutions for the majority of circuits. Finally, we develop a certified qubit recycler that integrates verification and validation techniques, with its correctness proof mechanized in Coq.

A Lightweight Polyglot Code Transformation Language

In today's software industry, large-scale, multi-language codebases are the norm. This brings substantial challenges in developing automated tools for code maintenance tasks such as API migration or dead code cleanup. Tool builders often find themselves caught between two less-than-ideal tooling options: (1) language-specific code rewriting tools or (2) generic, lightweight match-replace transformation tools with limited expressiveness. The former leads to tool fragmentation and a steep learning curve for each language, while the latter forces developers to create ad-hoc, throwaway scripts to handle realistic tasks. To fill this gap, we introduce a new declarative domain-specific language (DSL) for expressing interdependent multi-language code transformations. Our key insight is that we can increase the expressiveness and applicability of lightweight match-replace tools by extending them to support for composition, ordering, and flow. We implemented an open-source tool for our language, called PolyglotPiranha, and deployed it in an industrial setting. We demonstrate its effectiveness through three case studies, where it deleted 210K lines of dead code and migrated 20K lines, across 1611 pull requests. We compare our DSL against state-of-the-art alternatives, and show that the tools we developed are faster, more concise, and easier to maintain.

An Algebraic Language for Specifying Quantum Networks

Quantum networks connect quantum capable nodes in order to achieve capabilities that are impossible only using classical information. Their fundamental unit of communication is the Bell pair, which consists of two entangled quantum bits. Unfortunately, Bell pairs are fragile and difficult to transmit directly, necessitating a network of repeaters, along with software and hardware that can ensure the desired results. Challenging intrinsic features of quantum networks, such as dealing with resource competition, motivate formal reasoning about quantum network protocols. To this end, we developed BellKAT, a novel specification language for quantum networks based upon Kleene algebra. To cater to the specific needs of quantum networks, we designed an algebraic structure, called BellSKA, which we use as the basis of BellKAT's denotational semantics. BellKAT's constructs describe entanglement distribution rules that allow for modular specification. We give BellKAT a sound and complete equational theory, allowing us to verify network protocols. We provide a prototype tool to showcase the expressiveness of BellKAT and how to optimize and verify networks in practice.

Linear Matching of JavaScript Regular Expressions

Modern regex languages have strayed far from well-understood traditional regular expressions: they include features that fundamentally transform the matching problem. In exchange for these features, modern regex engines at times suffer from exponential complexity blowups, a frequent source of denial-of-service vulnerabilities in JavaScript applications. Worse, regex semantics differ across languages, and the impact of these divergences on algorithmic design and worst-case matching complexity has seldom been investigated.

This paper provides a novel perspective on JavaScript's regex semantics by identifying a larger-than-previously-understood subset of the language that can be matched with linear time guarantees. In the process, we discover several cases where state-of-the-art algorithms were either wrong (semantically incorrect), inefficient (suffering from superlinear complexity) or excessively restrictive (assuming certain features could not be matched linearly). We introduce novel algorithms to restore correctness and linear complexity. We further advance the state-of-the-art in linear regex matching by presenting the first nonbacktracking algorithms for matching lookarounds in linear time: one supporting captureless lookbehinds in any regex language, and another leveraging a JavaScript property to support unrestricted lookaheads and lookbehinds. Finally, we describe new time and space complexity tradeoffs for regex engines. All of our algorithms are practical: we validated them in a prototype implementation, and some have also been merged in the V8 JavaScript implementation used in Chrome and Node.js.

Static Posterior Inference of Bayesian Probabilistic Programming via Polynomial Solving

In Bayesian probabilistic programming, a central problem is to estimate the normalised posterior distribution (NPD) of a probabilistic program with conditioning via score (a.k.a. observe) statements. Most previous approaches address this problem by Markov Chain Monte Carlo and variational inference, and therefore could not generate guaranteed outcomes within a finite time limit. Moreover, existing methods for exact inference either impose syntactic restrictions or cannot guarantee successful inference in general.

In this work, we propose a novel automated approach to derive guaranteed bounds for NPD via polynomial solving. We first establish a fixed-point theorem for the wide class of score-at-end Bayesian probabilistic programs that terminate almost-surely and have a single bounded score statement at program termination. Then, we propose a multiplicative variant of Optional Stopping Theorem (OST) to address score-recursive Bayesian programs where score statements with weights greater than one could appear inside a loop. Bayesian nonparametric models, enjoying a renaissance in statistics and machine learning, can be represented by score-recursive Bayesian programs and are difficult to handle due to an integrability issue. Finally, we use polynomial solving to implement our fixed-point theorem and OST variant. To improve the accuracy of the polynomial solving, we further propose a truncation operation and the synthesis of multiple bounds over various program inputs. Our approach can handle Bayesian probabilistic programs with unbounded while loops and continuous distributions with infinite supports. Experiments over a wide range of benchmarks show that compared with the most relevant approach (Beutner et al., PLDI 2022) for guaranteed NPD analysis via recursion unrolling, our approach is more time efficient and derives comparable or even tighter NPD bounds. Furthermore, our approach can handle score-recursive programs which previous approaches could not.

A HAT Trick: Automatically Verifying Representation Invariants using Symbolic Finite Automata

Functional programs typically interact with stateful libraries that hide state behind typed abstractions. One particularly important class of applications are data structure implementations that rely on such libraries to provide a level of efficiency and scalability that may be otherwise difficult to achieve. However, because the specifications of the methods provided by these libraries are necessarily general and rarely specialized to the needs of any specific client, any required application-level invariants must often be expressed in terms of additional constraints on the (often) opaque state maintained by the library.

In this paper, we consider the specification and verification of such representation invariants using symbolic finite automata (SFA). We show that SFAs can be used to succinctly and precisely capture fine-grained temporal and data-dependent histories of interactions between functional clients and stateful libraries. To facilitate modular and compositional reasoning, we integrate SFAs into a refinement type system to qualify stateful computations resulting from such interactions. The particular instantiation we consider, Hoare Automata Types (HATs), allows us to both specify and automatically type-check the representation invariants of a datatype, even when its implementation depends on stateful library methods that operate over hidden state.

We also develop a new bidirectional type checking algorithm that implements an efficient subtyping inclusion check over HATs, enabling their translation into a form amenable for SMT-based automated verification. We present extensive experimental results on an implementation of this algorithm that demonstrates the feasibility of type-checking complex and sophisticated HAT-specified OCaml data structure implementations layered on top of stateful library APIs.

Stream Types

We propose a rich foundational theory of typed data streams and stream transformers, motivated by two high-level goals. First, the type of a stream should be able to express complex sequential patterns of events over time. And second, it should describe the internal parallel structure of the stream, to support deterministic stream processing on parallel and distributed systems. To these ends, we introduce stream types, with operators capturing sequential composition, parallel composition, and iteration, plus a core calculus λST of transformers over typed streams that naturally supports a number of common streaming idioms, including punctuation, windowing, and parallel partitioning, as first-class constructions. λST exploits a Curry-Howard-like correspondence with an ordered variant of the Logic of Bunched Implication to program with streams compositionally and uses Brzozowski-style derivatives to enable an incremental, prefix-based operational semantics. To illustrate the programming style supported by the rich types of λST, we present a number of examples written in Delta, a prototype high-level language design based on λST.

SuperStack: Superoptimization of Stack-Bytecode via Greedy, Constraint-Based, and SAT Techniques

Given a loop-free sequence of instructions, superoptimization techniques use a constraint solver to search for an equivalent sequence that is optimal for a desired objective. The complexity of the search grows exponentially with the length of the solution being constructed, and the problem becomes intractable for large sequences of instructions. This paper presents a new approach to superoptimizing stack-bytecode via three novel components: (1) a greedy algorithm to refine the bound on the length of the optimal solution; (2) a new representation of the optimization problem as a set of weighted soft clauses in MaxSAT; (3) a series of domain-specific dominance and redundant constraints to reduce the search space for optimal solutions. We have developed a tool, named SuperStack, which can be used to find optimal code translations of modern stack-based bytecode, namely WebAssembly or Ethereum bytecode. Experimental evaluation on more than 500,000 sequences shows the proposed greedy, constraint-based and SAT combination is able to greatly increase optimization gains achieved by existing superoptimizers and reduce to at least a fourth the optimization time.

Compiling Conditional Quantum Gates without Using Helper Qubits

We present a compilation scheme for conditional quantum gates. Our scheme compiles a multi-qubit conditional to a linear number of two-qubit conditionals. This can be done straightforwardly with helper qubits, but we show how to do it without using helper qubits and with much fewer gates than in previous work. Specifically, our scheme requires 1/3 as many gates as the previous best scheme without using helper qubits, which is essential for practical use. Our experiments show that several quantum-circuit optimizers have little impact on the compiled code from the previous best scheme, confirming the need for our new scheme. Our experiments with Grover's algorithm and quantum walk also show that our scheme has a major impact on the reliability of the compiled code.

Hyper Hoare Logic: (Dis-)Proving Program Hyperproperties

Hoare logics are proof systems that allow one to formally establish properties of computer programs. Traditional Hoare logics prove properties of individual program executions (such as functional correctness). Hoare logic has been generalized to prove also properties of multiple executions of a program (so-called hyperproperties, such as determinism or non-interference). These program logics prove the absence of (bad combinations of) executions. On the other hand, program logics similar to Hoare logic have been proposed to disprove program properties (e.g., Incorrectness Logic), by proving the existence of (bad combinations of) executions. All of these logics have in common that they specify program properties using assertions over a fixed number of states, for instance, a single pre- and post-state for functional properties or pairs of pre- and post-states for non-interference. In this paper, we present Hyper Hoare Logic, a generalization of Hoare logic that lifts assertions to properties of arbitrary sets of states. The resulting logic is simple yet expressive: its judgments can express arbitrary program hyperproperties, a particular class of hyperproperties over the set of terminating executions of a program (including properties of individual program executions). By allowing assertions to reason about sets of states, Hyper Hoare Logic can reason about both the absence and the existence of (combinations of) executions, and, thereby, supports both proving and disproving program (hyper-)properties within the same logic, including (hyper-)properties that no existing Hoare logic can express. We prove that Hyper Hoare Logic is sound and complete, and demonstrate that it captures important proof principles naturally. All our technical results have been proved in Isabelle/HOL.

Towards Trustworthy Automated Program Verifiers: Formally Validating Translations into an Intermediate Verification Language

Automated program verifiers are typically implemented using an intermediate verification language (IVL), such as Boogie or Why3. A verifier front-end translates the input program and specification into an IVL program, while the back-end generates proof obligations for the IVL program and employs an SMT solver to discharge them. Soundness of such verifiers therefore requires that the front-end translation faithfully captures the semantics of the input program and specification in the IVL program, and that the back-end reports success only if the IVL program is actually correct. For a verification tool to be trustworthy, these soundness conditions must be satisfied by its actual implementation, not just the program logic it uses. In this paper, we present a novel validation methodology that, given a formal semantics for the input language and IVL, provides formal soundness guarantees for front-end implementations. For each run of the verifier, we automatically generate a proof in Isabelle showing that the correctness of the produced IVL program implies the correctness of the input program. This proof can be checked independently from the verifier, in Isabelle, and can be combined with existing work on validating back-ends to obtain an end-to-end soundness result. Our methodology based on forward simulation employs several modularisation strategies to handle the large semantic gap between the input language and the IVL, as well as the intricacies of practical, optimised translations. We present our methodology for the widely-used Viper and Boogie languages. Our evaluation shows that it is effective in validating the translations performed by the existing Viper implementation.

Live Verification in an Interactive Proof Assistant

We present a prototype for a tool that enables programmers to verify their code as they write it in real-time. After each line of code that the programmer writes, the tool tells the programmer whether it was able to prove absence of undefined behavior so far, and it displays a concise representation of the symbolic state of the program right after the added line. The user can then either write the next line of code, or if needed or desired, write a specially marked comment that provides hints on how to solve side conditions or on how to represent the symbolic state more nicely. Once the programmer has finished writing the program, it is already verified with a mathematical correctness proof. Other tools providing real-time feedback already exist, but ours is the first one that only relies on a small trusted proof checker and that provides a concise summary of the symbolic state at the point in the program currently being edited, as opposed to only indicating whether user-stated assertions or postconditions hold. Program verification requires loop invariants, which are hard to find and tedious to spell out. We explore a middle ground in the design space between the two extremes of requiring users to spell out loop invariants manually and attempting to infer loop invariants automatically: Since a loop invariant often looks quite similar to the symbolic state right before the loop, our tool asks the user to express the desired loop invariant as a diff from the symbolic state before the loop, which has the potential to lead to shorter, more maintainable proofs. We prototyped our technique in the interactive proof assistant Coq, so our framework creates machine-checked proofs that the developed functions satisfy their specifications when executed according to the formal semantics of the source language. Using a verified compiler proven against the same source-language semantics, we can ensure that the behavior of the compiled program matches the program's behavior as represented by the framework during the proof. Additionally, since our polyglot source files can be viewed as Coq or C files at the same time, users willing to accept a larger trusted code base can compile them with GCC.

Bringing the WebAssembly Standard up to Speed with SpecTec

WebAssembly (Wasm) is a portable low-level bytecode language and virtual machine that has seen increasing use in a variety of ecosystems. Its specification is unusually rigorous – including a full formal semantics for the language – and every new feature must be specified in this formal semantics, in prose, and in the official reference interpreter before it can be standardized. With the growing size of the language, this manual process with its redundancies has become laborious and error-prone, and in this work, we offer a solution. We present SpecTec, a domain-specific language (DSL) and toolchain that facilitates both the Wasm specification and the generation of artifacts necessary to standardize new features. SpecTec serves as a single source of truth — from a SpecTec definition of the Wasm semantics, we can generate a typeset specification, including formal definitions and prose pseudocode descriptions, and a meta-level interpreter. Further backends for test generation and interactive theorem proving are planned. We evaluate SpecTec’s ability to represent the latest Wasm 2.0 and show that the generated meta-level interpreter passes 100% of the applicable official test suite. We show that SpecTec is highly effective at discovering and preventing errors by detecting historical errors in the specification that have been corrected and ten errors in five proposals ready for inclusion in the next version of Wasm. Our ultimate aim is that SpecTec should be adopted by the Wasm standards community and used to specify future versions of the standard.

Space-Efficient Polymorphic Gradual Typing, Mostly Parametric

Since the arrival of gradual typing, which allows partially typed code in a single program, efficient implementations of gradual typing have been an active research topic. In this paper, we study the space-efficiency problem of gradual typing in the presence of parametric polymorphism. Based on the existing work that showed the impossibility of a space-efficient implementation that supports fully parametric polymorphism, this paper will show that a space-efficient implementation is, in principle, possible by slightly relaxing parametricity. We first develop λCmp, which is a coercion calculus with mostly parametric polymorphism, and show its relaxed parametricity. Then, we present λSmp, a space-efficient version of λCmp, and prove that λSmp programs can be executed in a space-efficient manner and that translation from λCmp to λSmp is type- and semantics-preserving.

Quest Complete: The Holy Grail of Gradual Security

Languages with gradual information-flow control combine static and dynamic techniques to prevent security leaks. Gradual languages should satisfy the gradual guarantee: programs that only differ in the precision of their type annotations should behave the same modulo cast errors. Unfortunately, Toro et al. [2018] identify a tension between the gradual guarantee and information security; they were unable to satisfy both properties in the language GSLRef and had to settle for only satisfying information-flow security. Azevedo de Amorim et al. [2020] show that by sacrificing type-guided classification, one obtains a language that satisfies both noninterference and the gradual guarantee. Bichhawat et al. [2021] show that both properties can be satisfied by sacrificing the no-sensitive-upgrade mechanism, replacing it with a static analysis. In this paper we present a language design, 𝜆★IFC, that satisfies both noninterference and the gradual guarantee without making any sacrifices. We keep the type-guided classification of GSLRef and use the standard no-sensitive-upgrade mechanism to prevent implicit flows through mutable references. The key to the design of 𝜆★IFC is to walk back the decision in GSLRef to include the unknown label ★ among the runtime security labels. We give a formal definition of 𝜆★IFC, prove the gradual guarantee, and prove noninterference. Of technical note, the semantics of 𝜆★IFC is the first gradual information-flow control language to be specified using coercion calculi (a la Henglein), thereby expanding the coercion-based theory of gradual typing.

Compatible Branch Coverage Driven Symbolic Execution for Efficient Bug Finding

Symbolic execution is a powerful technique for bug finding by generating test inputs to systematically explore all feasible paths within a given threshold. However, its practical usage is often limited by the path explosion problem. In this paper, we propose compatible branch coverage driven symbolic execution for efficient bug finding. Our new technique owns a novel path-pruning strategy obtained from program dependency analysis to effectively avoid unnecessary explorations. Specifically, based on a Compatible Branch Set, our technique directs symbolic execution to explore feasible branches while soundly pruning redundant paths that have no new contributions to branch coverage. We have implemented our approach atop KLEE and conducted experiments on a set of programs from Siemens Suite, GNU Coreutils, and other real-world programs. Experimental results show that, compared with the state-of-the-art symbolic execution techniques, our approach always uses significantly less time to reproduce bugs while achieving the same or better branch coverage. On average, our approach got over 45% path reduction and 3x speedup on the GNU Coreutils programs.

RichWasm: Bringing Safe, Fine-Grained, Shared-Memory Interoperability Down to WebAssembly

Safe, shared-memory interoperability between languages with different type systems and memory-safety guarantees is an intricate problem as crossing language boundaries may result in memory-safety violations. In this paper, we present RichWasm, a novel richly typed intermediate language designed to serve as a compilation target for typed high-level languages with different memory-safety guarantees. RichWasm is based on WebAssembly and enables safe shared-memory interoperability by incorporating a variety of type features that support fine-grained memory ownership and sharing. RichWasm is rich enough to serve as a typed compilation target for both typed garbage-collected languages and languages with an ownership-based type system and manually managed memory. We demonstrate this by providing compilers from core ML and L3, a type-safe language with strong updates, to RichWasm. RichWasm is compiled to regular Wasm, allowing for use in existing environments. We formalize RichWasm in Coq and prove type safety.

SpEQ: Translation of Sparse Codes using Equivalences

We present SpEQ, a quick and correct strategy for detecting semantics in sparse codes and enabling automatic translation to high-performance library calls or domain-specific languages (DSLs). When sparse linear algebra codes contain implicit preconditions about how data is stored that hamper direct translation, SpEQ identifies the high-level computation along with storage details and related preconditions. A run-time check guards the translation and ensures that required preconditions are met. We implement SpEQ using the LLVM framework, the Z3 solver, and egglog library and correctly translate sparse linear algebra codes into two high-performance libraries, NVIDIA cuSPARSE and Intel MKL, and OpenMP (OMP). We evaluate SpEQ on ten diverse benchmarks against two state-of-the-art translation tools. SpEQ achieves geometric mean speedups of 3.25×, 5.09×, and 8.04× on OpenMP, MKL, and cuSPARSE backends, respectively. SpEQ is the only tool that can guarantee the correct translation of sparse computations.

Foundational Integration Verification of a Cryptographic Server

We present verification of a bare-metal server built using diverse implementation techniques and languages against a whole-system input-output specification in terms of machine code, network packets, and mathematical specifications of elliptic-curve cryptography. We used very different formal-reasoning techniques throughout the stack, ranging from computer algebra, symbolic execution, and verification-condition generation to interactive verification of functional programs including compilers for C-like and functional languages. All these component specifications and domain-specific reasoning techniques are defined and justified against common foundations in the Coq proof assistant. Connecting these components is a minimalistic specification style based on functional programs and assertions over simple objects, omnisemantics for program execution, and basic separation logic for memory layout. This design enables us to bring the components together in a top-level correctness theorem that can be audited without understanding or trusting the internal interfaces and tools. Our case study is a simple cryptographic server for flipping of a bit of state through public-key authenticated network messages, and its proof shows total functional correctness including static bounds on memory usage. This paper also describes our experiences with the specific verification tools we build upon, along with detailed analysis of reasons behind the widely varying levels of productivity we experienced between combinations of tools and tasks.

Reward-Guided Synthesis of Intelligent Agents with Control Structures

Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation.

Compiling Probabilistic Programs for Variable Elimination with Information Flow

A key promise of probabilistic programming is the ability to specify rich models using an expressive program- ming language. However, the expressive power that makes probabilistic programming languages enticing also poses challenges to inference, so much so that specialized approaches to inference ban language features such as recursion. We present an approach to variable elimination and marginal inference for probabilistic programs featuring bounded recursion, discrete distributions, and sometimes continuous distributions. A compiler eliminates probabilistic side effects, using a novel information-flow type system to factorize probabilistic computations and hoist independent subcomputations out of sums or integrals. For a broad class of recursive programs with dynamically recurring substructure, the compiler effectively decomposes a global marginal-inference problem, which may otherwise be intractable, into tractable subproblems. We prove the compilation correct by showing that it preserves denotational semantics. Experiments show that the compiled programs subsume widely used PTIME algorithms for recursive models and that the compilation time scales with the size of the inference problems. As a separate contribution, we develop a denotational, logical-relations model of information-flow types in the novel measure-theoretic setting of probabilistic programming; we use it to prove noninterference and consequently the correctness of variable elimination.

SPORE: Combining Symmetry and Partial Order Reduction

Symmetry reduction (SR) and partial order reduction (POR) aim to scale up model checking by exploiting the underlying program structure: SR avoids exploring executions equivalent up to some permutation of symmetric threads, while POR avoids exploring executions equivalent up to reordering of independent instructions. While both SR and POR have been well studied individually, their combination in the context of stateless model checking has remained an open problem. In this paper, we present SPORE, the first stateless model checker that combines SR and POR in a sound, complete and optimal manner. SPORE can leverage both symmetries in the client program itself, but also internal symmetries in the underlying implementation (i.e., idempotent operations), a novel symmetry notion we introduce in this paper. Our experiments confirm that SPORE explores drastically fewer executions than tools that solely employ SR/POR, thereby greatly advancing the state-of-the-art.

Predictable Verification using Intrinsic Definitions

We propose a novel mechanism of defining data structures using intrinsic definitions that avoids recursion and instead utilizes monadic maps satisfying local conditions. We show that intrinsic definitions are a powerful mechanism that can capture a variety of data structures naturally. We show that they also enable a predictable verification methodology that allows engineers to write ghost code to update monadic maps and perform verification using reduction to decidable logics. We evaluate our methodology using Boogie and prove a suite of data structure manipulating programs correct.

Context-Free Language Reachability via Skewed Tabulation

Context-free language reachability (CFL-reachability) is a prominent model for formulating program analysis problems. Almost all CFL-reachability algorithms are based on the Reps-Horwitz-Sagiv (RHS) tabulation. In essence, the RHS tabulation, based on normalized context-free grammars, is similar to the CYK algorithm for CFL-parsing. Consider a normalized rule S ::= A B and a CFL-reachability problem instance of computing S-edges in the input graph. The RHS tabulation obtains all summary edges (i.e., S-, A-, and B-edges) based on the grammar rules. However, many A- and B-edges are wasted because only a subset of those edges eventually contributes to generating S-edges in the input graph. This paper proposes a new tabulation strategy for speeding up CFL-reachability by eliminating wasted and unnecessary summary edges. We particularly focus on recursive nonterminals. Our key technical insight is that the wasted edge generations and insertions caused by recursive nonterminals can be avoided by modifying the parse trees either statically (by transforming the grammar) or dynamically (using a specialized online CFL-reachability solver). For example, if a recursive nonterminal B, generated by a rule B ::= B X, appears on the right-hand side of a rule S ::= A B, we can make S recursive (by introducing a new rule S ::= S X) and eliminate the original recursive rule (B ::= B X). Due to the rule S ::= S X, the shapes of the parse trees associated with the left-hand-side nonterminal S become more "skewed". Thus, we name our approach skewed tabulation for CFL-reachability. Skewed tabulation can significantly improve the scalability of CFL-reachability by reducing wasted and unnecessary summary edges. We have implemented skewed tabulation and applied the corresponding CFL-reachability algorithm to an alias analysis, a value-flow analysis, and a taint analysis. Our extensive evaluation based on SPEC 2017 benchmarks yields promising results. For the three client analyses, CFL-reachability based on skewed tabulation can achieve 3.34×, 1.13× and 2.05× speedup over the state-of-the-art RHS-tabulation-based CFL-reachability solver and consume 60.05%, 20.38% and 63.06% less memory, respectively. Furthermore, the cost of grammar transformation for skewed tabulation is negligible, typically taking less than one second.

Falcon: A Scalable Analytical Cache Model

Compilers often use performance models to decide how to optimize code. This is often preferred over using hardware performance measurements, since hardware measurements can be expensive, limited by hardware availability, and makes the output of compilation non-deterministic. Analytical models, on the other hand, serve as efficient and noise-free performance indicators. Since many optimizations focus on improving memory performance, memory cache miss rate estimations can serve as an effective and noise-free performance indicator for superoptimizers, worst-case execution time analyses, manual program optimization, and many other performance-focused use cases. Existing methods to model the cache behavior of affine programs work on small programs such as those in the Polybench benchmark but do not scale to the larger programs we would like to optimize in production, which can be orders of magnitude bigger by lines of code. These analytical approaches hand of the whole program to a Presburger solver and perform expensive mathematical operations on the huge resulting formulas. We develop a scalable cache model for affine programs that splits the computation into smaller pieces that do not trigger the worst-case asymptotic behavior of these solvers. We evaluate our approach on 46 TorchVision neural networks, finding that our model has a geomean runtime of 44.9 seconds compared to over 32 minutes for the state-of-the-art prior cache model, and the latter is actually smaller than the true value because the prior model reached our four hour time limit on 54% of the networks, and this limit was never reached by our tool. Our model exploits parallelism effectively: running it on sixteen cores is 8.2x faster than running it single-threaded. While the state-of-the-art model takes over four hours to analyze a majority of the benchmark programs, Falcon produces results in at most 3 minutes and 3 seconds; moreover, after a local modification to the program being analyzed, our model efficiently updates the predictions in 513 ms on average (geomean). Thus, we provide the first scalable analytical cache model.

Equivalence by Canonicalization for Synthesis-Backed Refactoring

We present an enumerative program synthesis framework called component-based refactoring that can refactor "direct" style code that does not use library components into equivalent "combinator" style code that does use library components. This framework introduces a sound but incomplete technique to check the equivalence of direct code and combinator code called equivalence by canonicalization that does not rely on input-output examples or logical specifications. Moreover, our approach can repurpose existing compiler optimizations, leveraging decades of research from the programming languages community. We instantiated our new synthesis framework in two contexts: (i) higher-order functional combinators such as map and filter in the statically-typed functional programming language Elm and (ii) high-performance numerical computing combinators provided by the NumPy library for Python. We implemented both instantiations in a tool called Cobbler and evaluated it on thousands of real programs to test the performance of the component-based refactoring framework in terms of execution time and output quality. Our work offers evidence that synthesis-backed refactoring can apply across a range of domains without specification beyond the input program.

KATch: A Fast Symbolic Verifier for NetKAT

We develop new data structures and algorithms for checking verification queries in NetKAT, a domain-specific language for specifying the behavior of network data planes. Our results extend the techniques obtained in prior work on symbolic automata and provide a framework for building efficient and scalable verification tools. We present KATch, an implementation of these ideas in Scala, featuring an extended set of NetKAT operators that are useful for expressing network-wide specifications, and a verification engine that constructs a bisimulation or generates a counter-example showing that none exists. We evaluate the performance of our implementation on real-world and synthetic benchmarks, verifying properties such as reachability and slice isolation, typically returning a result in well under a second, which is orders of magnitude faster than previous approaches. Our advancements underscore NetKAT's potential as a practical, declarative language for network specification and verification.

Hyperblock Scheduling for Verified High-Level Synthesis

High-level synthesis (HLS) is the automatic compilation of software programs into custom hardware designs. With programmable hardware devices (such as FPGAs) now widespread, HLS is increasingly relied upon, but existing HLS tools are too unreliable for safety- and security-critical applications. Herklotz et al. partially addressed this concern by building Vericert, a prototype HLS tool that is proven correct in Coq (à la CompCert), but it cannot compete performance-wise with unverified tools. This paper reports on our efforts to close this performance gap, thus obtaining the first practical verified HLS tool. We achieve this by implementing a flexible operation scheduler based on hyperblocks (basic blocks of predicated instructions) that supports operation chaining (packing dependent operations into a single clock cycle). Correctness is proven via translation validation: each schedule is checked using a Coq-verified validator that uses a SAT solver to reason about predicates. Avoiding exponential blow-up in this validation process is a key challenge, which we address by using final-state predicates and value summaries. Experiments on the PolyBench/C suite indicate that scheduling makes Vericert-generated hardware 2.1× faster, thus bringing Vericert into competition with a state-of-the-art open-source HLS tool when a similar set of optimisations is enabled.

Numerical Fuzz: A Type System for Rounding Error Analysis

Algorithms operating on real numbers are implemented as floating-point computations in practice, but floating-point operations introduce roundoff errors that can degrade the accuracy of the result. We propose Λnum, a functional programming language with a type system that can express quantitative bounds on roundoff error. Our type system combines a sensitivity analysis, enforced through a linear typing discipline, with a novel graded monad to track the accumulation of roundoff errors. We prove that our type system is sound by relating the denotational semantics of our language to the exact and floating-point operational semantics.

To demonstrate our system, we instantiate Λnum with error metrics proposed in the numerical analysis literature and we show how to incorporate rounding operations that faithfully model aspects of the IEEE 754 floating-point standard. To show that Λnum can be a useful tool for automated error analysis, we develop a prototype implementation for Λnum that infers error bounds that are competitive with existing tools, while often running significantly faster. Finally, we consider semantic extensions of our graded monad to bound error under more complex rounding behaviors, such as non-deterministic and randomized rounding.

Inductive Approach to Spacer

The constrained Horn clause satisfiability problem is at the core of many automated verification methods, and Spacer is one of the most efficient solvers of this problem. The standard description of Spacer is based on an abstract transition system, dividing the whole procedure into small rules. This division makes individual rules easier to understand but, conversely, makes it difficult to discuss the procedure as a whole. As evidence of the difficulty in understanding the whole procedure, we point out that the claimed refutational completeness actually fails for several reasons, some of which were not present in the original version and subsequently added. It is also difficult to grasp the differences between Spacer and another procedure, such as GPDR. This paper aims to provide a better understanding of Spacer by developing a Spacer-like procedure defined by structural induction. We first formulate the problem to be solved inductively, then give its naïve solver and transform it to obtain a Spacer-like procedure. Interestingly, our inductive approach almost unifies Spacer and GPDR, which differ in only one respect in our understanding. To demonstrate the usefulness of our inductive approach in understanding Spacer, we examine Spacer variants in the literature in terms of inductive procedures and discuss why they are not refutationally complete and how to fix them. We also implemented the proposed procedure and evaluated it experimentally.

V-Star: Learning Visibly Pushdown Grammars from Program Inputs

Accurate description of program inputs remains a critical challenge in the field of programming languages. Active learning, as a well-established field, achieves exact learning for regular languages. We offer an innovative grammar inference tool, V-Star, based on the active learning of visibly pushdown automata. V-Star deduces nesting structures of program input languages from sample inputs, employing a novel inference mechanism based on nested patterns. This mechanism identifies token boundaries and converts languages such as XML documents into VPLs. We then adapted Angluin's L-Star, an exact learning algorithm, for VPA learning, which improves the precision of our tool. Our evaluation demonstrates that V-Star effectively and efficiently learns a variety of practical grammars, including S-Expressions, JSON, and XML, and outperforms other state-of-the-art tools.

Hashing Modulo Context-Sensitive 𝛼-Equivalence

The notion of α-equivalence between λ-terms is commonly used to identify terms that are considered equal. However, due to the primitive treatment of free variables, this notion falls short when comparing subterms occurring within a larger context. Depending on the usage of the Barendregt convention (choosing different variable names for all involved binders), it will equate either too few or too many subterms. We introduce a formal notion of context-sensitive α-equivalence, where two open terms can be compared within a context that resolves their free variables. We show that this equivalence coincides exactly with the notion of bisimulation equivalence. Furthermore, we present an efficient O(nlogn) runtime hashing scheme that identifies λ-terms modulo context-sensitive α-equivalence, generalizing over traditional bisimulation partitioning algorithms and improving upon a previously established O(nlog2 n) bound for a hashing modulo ordinary α-equivalence by Maziarz et al. Hashing λ-terms is useful in many applications that require common subterm elimination and structure sharing. We hav employed the algorithm to obtain a large-scale, densely packed, interconnected graph of mathematical knowledge from the Coq proof assistant for machine learning purposes.

Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code Fragments

Lightweight syntactic analysis tools like Semgrep and Comby leverage the tree structure of code, making them more expressive than string and regex search. Unlike traditional language frameworks (e.g., ESLint) that analyze codebases via explicit syntax tree manipulations, these tools use query languages that closely resemble the source language. However, state-of-the-art matching techniques for these tools require queries to be complete and parsable snippets, which makes in-progress query specifications useless. We propose a new search architecture that relies only on tokenizing (not parsing) a query. We introduce a novel language and matching algorithm to support tree-aware wildcards on this architecture by building on tree automata. We also present stsearch, a syntactic search tool leveraging our approach. In contrast to past work, our approach supports syntactic search even for previously unparsable queries. We show empirically that stsearch can support all tokenizable queries, while still providing results comparable to Semgrep for existing queries. Our work offers evidence that lightweight syntactic code search can accept in-progress specifications, potentially improving support for interactive settings.

Static Analysis for Checking the Disambiguation Robustness of Regular Expressions

Regular expressions are commonly used for finding and extracting matches from sequence data. Due to the inherent ambiguity of regular expressions, a disambiguation policy must be considered for the match extraction problem, in order to uniquely determine the desired match out of the possibly many matches. The most common disambiguation policies are the POSIX policy and the greedy (PCRE) policy. The POSIX policy chooses the longest match out of the leftmost ones. The greedy policy chooses a leftmost match and further disambiguates using a greedy interpretation of Kleene iteration to match as many times as possible. The choice of disambiguation policy can affect the output of match extraction, which can be an issue for reusing regular expressions across regex engines. In this paper, we introduce and study the notion of disambiguation robustness for regular expressions. A regular expression is robust if its extraction semantics is indifferent to whether the POSIX or greedy disambiguation policy is chosen. This gives rise to a decision problem for regular expressions, which we prove to be PSPACE-complete. We propose a static analysis algorithm for checking the (non-)robustness of regular expressions and two performance optimizations. We have implemented the proposed algorithms and we have shown experimentally that they are practical for analyzing large datasets of regular expressions derived from various application domains.

Equivalence and Similarity Refutation for Probabilistic Programs

We consider the problems of statically refuting equivalence and similarity of output distributions defined by a pair of probabilistic programs. Equivalence and similarity are two fundamental relational properties of probabilistic programs that are essential for their correctness both in implementation and in compilation. In this work, we present a new method for static equivalence and similarity refutation. Our method refutes equivalence and similarity by computing a function over program outputs whose expected value with respect to the output distributions of two programs is different. The function is computed simultaneously with an upper expectation supermartingale and a lower expectation submartingale for the two programs, which we show to together provide a formal certificate for refuting equivalence and similarity. To the best of our knowledge, our method is the first approach to relational program analysis to offer the combination of the following desirable features: (1) it is fully automated, (2) it is applicable to infinite-state probabilistic programs, and (3) it provides formal guarantees on the correctness of its results. We implement a prototype of our method and our experiments demonstrate the effectiveness of our method to refute equivalence and similarity for a number of examples collected from the literature.

Probabilistic Programming with Programmable Variational Inference

Compared to the wide array of advanced Monte Carlo methods supported by modern probabilistic programming languages (PPLs), PPL support for variational inference (VI) is less developed: users are typically limited to a predefined selection of variational objectives and gradient estimators, which are implemented monolithically (and without formal correctness arguments) in PPL backends. In this paper, we propose a more modular approach to supporting variational inference in PPLs, based on compositional program transformation. In our approach, variational objectives are expressed as programs, that may employ first-class constructs for computing densities of and expected values under user-defined models and variational families. We then transform these programs systematically into unbiased gradient estimators for optimizing the objectives they define. Our design makes it possible to prove unbiasedness by reasoning modularly about many interacting concerns in PPL implementations of variational inference, including automatic differentiation, density accumulation, tracing, and the application of unbiased gradient estimation strategies. Additionally, relative to existing support for VI in PPLs, our design increases expressiveness along three axes: (1) it supports an open-ended set of user-defined variational objectives, rather than a fixed menu of options; (2) it supports a combinatorial space of gradient estimation strategies, many not automated by today’s PPLs; and (3) it supports a broader class of models and variational families, because it supports constructs for approximate marginalization and normalization (previously introduced for Monte Carlo inference). We implement our approach in an extension to the Gen probabilistic programming system (, implemented in JAX), and evaluate our automation on several deep generative modeling tasks, showing minimal performance overhead vs. hand-coded implementations and performance competitive with well-established open-source PPLs.

PL4XGL: A Programming Language Approach to Explainable Graph Learning

In this article, we present a new, language-based approach to explainable graph learning. Though graph neural networks (GNNs) have shown impressive performance in various graph learning tasks, they have severe limitations in explainability, hindering their use in decision-critical applications. To address these limitations, several GNN explanation techniques have been proposed using a post-hoc explanation approach providing subgraphs as explanations for classification results. Unfortunately, however, they have two fundamental drawbacks in terms of additional explanation costs and 2) the correctness of the explanations. This paper aims to address these problems by developing a new graph-learning method based on programming language techniques. Our key idea is two-fold: 1) designing a graph description language (GDL) to explain the classification results and 2) developing a new GDL-based interpretable classification model instead of GNN-based models. Our graph-learning model, called PL4XGL, consists of a set of candidate GDL programs with labels and quality scores. For a given graph component, it searches the best GDL program describing the component and provides the corresponding label as the classification result and the program as the explanation. In our approach, learning from data is formulated as a program-synthesis problem, and we present top-down and bottom-up algorithms for synthesizing GDL programs from training data. Evaluation using widely-used datasets demonstrates that PL4XGL produces high-quality explanations that outperform those produced by the state-of-the-art GNN explanation technique, SubgraphX. We also show that PL4XGL achieves competitive classification accuracy comparable to popular GNN models.

A Family of Fast and Memory Efficient Lock- and Wait-Free Reclamation

Historically, memory management based on lock-free reference counting was very inefficient, especially for read-dominated workloads. Thus, approaches such as epoch-based reclamation (EBR), hazard pointers (HP), or a combination thereof have received significant attention. EBR exhibits excellent performance but is blocking due to potentially unbounded memory usage. In contrast, HP are non-blocking and achieve good memory efficiency but are much slower. Moreover, HP are only lock-free in the general case. Recently, several new memory reclamation approaches such as WFE and Hyaline have been proposed. WFE achieves wait-freedom, but is less memory efficient and performs suboptimally in oversubscribed scenarios; Hyaline achieves higher performance and memory efficiency, but lacks wait-freedom.

We present a family of non-blocking memory reclamation schemes, called Crystalline, that simultaneously addresses the challenges of high performance, high memory efficiency, and wait-freedom. Crystalline can guarantee complete wait-freedom even when threads are dynamically recycled, asynchronously reclaims memory in the sense that any thread can reclaim memory retired by any other thread, and ensures (an almost) balanced reclamation workload across all threads. The latter two properties result in Crystalline's high performance and memory efficiency. Simultaneously ensuring all three properties requires overcoming unique challenges. Crystalline supports ubiquitous x86-64 and ARM64 architectures, while achieving superior throughput than prior fast schemes such as EBR as the number of threads grows.

We also accentuate that many recent approaches, unlike HP, lack strict non-blocking guarantees when used with multiple data structures. By providing full wait-freedom, Crystalline addresses this problem as well.