Proceedings of the ACM on Programming Languages: Vol. 8, No. OOPSLA1. 2024

Full Citation in the ACM Digital Library


Quantum Control Machine: The Limits of Control Flow in Quantum Programming

Quantum algorithms for tasks such as factorization, search, and simulation rely on control flow such as branching and iteration that depends on the value of data in superposition. High-level programming abstractions for control flow, such as switches, loops, higher-order functions, and continuations, are ubiquitous in classical languages. By contrast, many quantum languages do not provide high-level abstractions for control flow in superposition, and instead require the use of hardware-level logic gates to implement such control flow.

The reason for this gap is that whereas a classical computer supports control flow abstractions using a program counter that can depend on data, the typical architecture of a quantum computer does not analogously provide a program counter that can depend on data in superposition. As a result, the complete set of control flow abstractions that can be correctly realized on a quantum computer has not yet been established.

In this work, we provide a complete characterization of the properties of control flow abstractions that are correctly realizable on a quantum computer. First, we prove that even on a quantum computer whose program counter exists in superposition, one cannot correctly realize control flow in quantum algorithms by lifting the classical conditional jump instruction to work in superposition. This theorem denies the ability to directly lift general abstractions for control flow such as the λ-calculus from classical to quantum programming.

In response, we present the necessary and sufficient conditions for control flow to be correctly realizable on a quantum computer. We introduce the quantum control machine, an instruction set architecture featuring a conditional jump that is restricted to satisfy these conditions. We show how this design enables a developer to correctly express control flow in quantum algorithms using a program counter in place of logic gates.

Profiling Programming Language Learning

This paper documents a year-long experiment to “profile” the process of learning a programming language: gathering data to understand what makes a language hard to learn, and using that data to improve the learning process. We added interactive quizzes to The Rust Programming Language, the official textbook for learning Rust. Over 13 months, 62,526 readers answered questions 1,140,202 times. First, we analyze the trajectories of readers. We find that many readers drop-out of the book early when faced with difficult language concepts like Rust’s ownership types. Second, we use classical test theory and item response theory to analyze the characteristics of quiz questions. We find that better questions are more conceptual in nature, such as asking why a program does not compile vs. whether a program compiles. Third, we performed 12 interventions into the book to help readers with difficult questions. We find that on average, interventions improved quiz scores on the targeted questions by +20

Synthetiq: Fast and Versatile Quantum Circuit Synthesis

To implement quantum algorithms on quantum computers it is crucial to decompose their operators into the limited gate set supported by those computers. Unfortunately, existing works automating this essential task are generally slow and only applicable to narrow use cases.We present Synthetiq, a method to synthesize quantum circuits implementing a given specification over arbitrary finite gate sets, which is faster and more versatile than existing works. Synthetiq utilizes Simulated Annealing instantiated with a novel, domain-specific energy function that allows developers to leverage partial specifications for better efficiency. Synthetiq further couples this synthesis method with a custom simplification pass, to ensure efficiency of the found circuits.

We experimentally demonstrate that Synthetiq can generate better implementations than were previously known for multiple relevant quantum operators including RCCCX, CCT, CCiSWAP, C√SWAP, and C√iSWAP. Our extensive evaluation also demonstrates Synthetiq frequently outperforms a wide variety of more specialized tools in their own domains, including (i) ‍the well-studied task of synthesizing fully specified operators in the Clifford+T gate set, (ii) ‍є-approximate synthesis of multi-qubit operators in the same gate set, and (iii) ‍synthesis tasks with custom gate sets. On all those tasks, Synthetiq is typically one to two orders of magnitude faster than previous state-of-the-art and can tackle problems that were previously out of the reach of any synthesis tool.

A Learning-Based Approach to Static Program Slicing

Traditional program slicing techniques are crucial for early bug detection and manual/automated debugging of online code snippets. Nevertheless, their inability to handle incomplete code hinders their real-world applicability in such scenarios. To overcome these challenges, we present NS-Slicer, a novel learning-based approach that predicts static program slices for both complete and partial code Our tool leverages a pre-trained language model to exploit its understanding of fine-grained variable-statement dependencies within source code. With this knowledge, given a variable at a specific location and a statement in a code snippet, NS-Slicer determines whether the statement belongs to the backward slice or forward slice, respectively. We conducted a series of experiments to evaluate NS-Slicer's performance. On complete code, it predicts the backward and forward slices with an F1-score of 97.41% and 95.82%, respectively, while achieving an overall F1-score of 96.77%. Notably, in 85.20% of the cases, the static program slices predicted by NS-Slicer exactly match entire slices from the oracle. For partial programs, it achieved an F1-score of 96.77%–97.49% for backward slicing, 92.14%–95.40% for forward slicing, and an overall F1-score of 94.66%–96.62%. Furthermore, we demonstrate NS-Slicer's utility in vulnerability detection (VD), integrating its predicted slices into an automated VD tool. In this setup, the tool detected vulnerabilities in Java code with a high F1-score of 73.38%. We also include the analyses studying NS-Slicer’s promising performance and limitations, providing insights into its understanding of intrinsic code properties such as variable aliasing, leading to better slicing.

Finding Cross-Rule Optimization Bugs in Datalog Engines

Datalog is a popular and widely-used declarative logic programming language. Datalog engines apply many cross-rule optimizations; bugs in them can cause incorrect results. To detect such optimization bugs, we propose an automated testing approach called Incremental Rule Evaluation (IRE), which synergistically tackles the test oracle and test case generation problem. The core idea behind the test oracle is to compare the results of an optimized program and a program without cross-rule optimization; any difference indicates a bug in the Datalog engine. Our core insight is that, for an optimized, incrementally-generated Datalog program, we can evaluate all rules individually by constructing a reference program to disable the optimizations that are performed among multiple rules. Incrementally generating test cases not only allows us to apply the test oracle for every new rule generated—we also can ensure that every newly added rule generates a non-empty result with a given probability and eschew recomputing already-known facts. We implemented IRE as a tool named Deopt, and evaluated Deopt on four mature Datalog engines, namely Soufflé, CozoDB, μZ, and DDlog, and discovered a total of 30 bugs. Of these, 13 were logic bugs, while the remaining were crash and error bugs. Deopt can detect all bugs found by queryFuzz, a state-of-the-art approach. Out of the bugs identified by Deopt, queryFuzz might be unable to detect 5. Our incremental test case generation approach is efficient; for example, for test cases containing 60 rules, our incremental approach can produce 1.17× (for DDlog) to 31.02× (for Soufflé) as many valid test cases with non-empty results as the naive random method. We believe that the simplicity and the generality of the approach will lead to its wide adoption in practice.

UniSparse: An Intermediate Language for General Sparse Format Customization

The ongoing trend of hardware specialization has led to a growing use of custom data formats when processing sparse workloads, which are typically memory-bound. These formats facilitate optimized software/hardware implementations by utilizing sparsity pattern- or target-aware data structures and layouts to enhance memory access latency and bandwidth utilization. However, existing sparse tensor programming models and compilers offer little or no support for productively customizing the sparse formats. Additionally, because these frameworks represent formats using a limited set of per-dimension attributes, they lack the flexibility to accommodate numerous new variations of custom sparse data structures and layouts. To overcome this deficiency, we propose UniSparse, an intermediate language that provides a unified abstraction for representing and customizing sparse formats. Unlike the existing attribute-based frameworks, UniSparse decouples the logical representation of the sparse tensor (i.e., the data structure) from its low-level memory layout, enabling the customization of both. As a result, a rich set of format customizations can be succinctly expressed in a small set of well-defined query, mutation, and layout primitives. We also develop a compiler leveraging the MLIR infrastructure, which supports adaptive customization of formats, and automatic code generation of format conversion and compute operations for heterogeneous architectures. We demonstrate the efficacy of our approach through experiments running commonly-used sparse linear algebra operations with specialized formats on multiple different hardware targets, including an Intel CPU, an NVIDIA GPU, an AMD Xilinx FPGA, and a simulated processing-in-memory (PIM) device.

Cocoon: Static Information Flow Control in Rust

Information flow control (IFC) provides confidentiality by enforcing noninterference, which ensures that high-secrecy values cannot affect low-secrecy values. Prior work introduces fine-grained IFC approaches that modify the programming language and use non-standard compilation tools, impose run-time overhead, or report false secrecy leaks—all of which hinder adoption. This paper presents Cocoon, a Rust library for static type-based IFC that uses the unmodified Rust language and compiler. The key insight of Cocoon lies in leveraging Rust’s type system and procedural macros to establish an effect system that enforces noninterference. A performance evaluation shows that using Cocoon increases compile time but has no impact on application performance. To demonstrate Cocoon’s utility, we retrofitted two popular Rust programs, the Spotify TUI client and Mozilla’s Servo browser engine, to use Cocoon to enforce limited confidentiality policies

Fulfilling OCaml Modules with Transparency

ML modules come as an additional layer on top of a core language to offer large-scale notions of composition and abstraction. They largely contributed to the success of OCaml and SML. While modules are easy to write for common cases, their advanced use may become tricky. Additionally, despite a long line of works, their meta-theory remains difficult to comprehend, with involved soundness proofs. In fact, the module layer of OCaml does not currently have a formal specification and its implementation has some surprising behaviors.

Building on previous translations from ML modules to Fω, we propose a type system, called Mω, that covers a large subset of OCaml modules, including both applicative and generative functors, and extended with transparent ascription. This system produces signatures in an OCaml-like syntax extended with Fω quantifiers. We provide a reverse translation from Mω signatures to path-based source signatures along with a characterization of signature avoidance cases, making Mω signatures well suited to serve as a new internal representation for a typechecker.

The soundness of the type system is shown by elaboration in Fω. We improve over previous encodings of sealing within applicative functors, by the introduction of transparent existential types, a weaker form of existential types that can be lifted out of universal and arrow types. This shines a new light on the form of abstraction provided by applicative functors and brings their treatment much closer to those of generative functors.

HOL4P4: Mechanized Small-Step Semantics for P4

We present the first semantics of the network data plane programming language P4 able to adequately capture all key features of P416, the most recent version of P4, including external functions (externs) and concurrency. These features are intimately related since, in P4, extern invocations are the only points at which one execution thread can affect another. Reflecting P4’s lack of a general-purpose memory and the presence of multithreading the semantics is given in small-step style and eschews the use of a heap. In addition to the P4 language itself, we provide an architectural level semantics, which allows the composition of P4-programmed blocks, models end-to-end packet processing, and can take into account features such as arbitration and packet recirculation. A corresponding type system is provided with attendant progress, preservation, and type-soundness theorems. Semantics, type system, and meta-theory are formalized in the HOL4 theorem prover. From this formalization, we derive a HOL4 executable semantics that supports verified execution of programs with partially symbolic packets able to validate simple end-to-end program properties.

Compiling Recurrences over Dense and Sparse Arrays

We present a framework for compiling recurrence equations into native code. In our framework, users specify a system of recurrences, the types of data structures that store inputs and outputs, and scheduling commands for optimization. Our compiler then lowers these specifications into native code that respects the dependencies in the recurrence equations. Our compiler can generate code over both sparse and dense data structures, and determines if the recurrence system is solvable with the provided scheduling primitives. We evaluate the performance and correctness of the generated code on several recurrences, from domains as diverse as dense and sparse matrix solvers, dynamic programming, graph problems, and sparse tensor algebra. We demonstrate that the generated code has competitive performance to hand-optimized implementations in libraries. However, these handwritten libraries target specific recurrences, specific data structures, and specific optimizations. Our system, on the other hand, automatically generates implementations from recurrences, data formats, and schedules, giving our system more generality than library approaches.

Outcome Separation Logic: Local Reasoning for Correctness and Incorrectness with Computational Effects

Separation logic’s compositionality and local reasoning properties have led to significant advances in scalable static analysis. But program analysis has new challenges—many programs display computational effects and, orthogonally, static analyzers must handle incorrectness too. We present Outcome Separation Logic (OSL), a program logic that is sound for both correctness and incorrectness reasoning in programs with varying effects. OSL has a frame rule—just like separation logic—but uses different underlying assumptions that open up local reasoning to a larger class of properties than can be handled by any single existing logic.

Building on this foundational theory, we also define symbolic execution algorithms that use bi-abduction to derive specifications for programs with effects. This involves a new tri-abduction procedure to analyze programs whose execution branches due to effects such as nondeterministic or probabilistic choice. This work furthers the compositionality promised by separation logic by opening up the possibility for greater reuse of analysis tools across two dimensions: bug-finding vs verification in programs with varying effects.

Newtonian Program Analysis of Probabilistic Programs

Due to their quantitative nature, probabilistic programs pose non-trivial challenges for designing compositional and efficient program analyses. Many analyses for probabilistic programs rely on iterative approximation. This article presents an interprocedural dataflow-analysis framework, called NPA-PMA, for designing and implementing (partially) non-iterative program analyses of probabilistic programs with unstructured control-flow, nondeterminism, and general recursion. NPA-PMA is based on Newtonian Program Analysis (NPA), a generalization of Newton's method to solve equation systems over semirings. The key challenge for developing NPA-PMA is to handle multiple kinds of confluences in both the algebraic structures that specify analyses and the equation systems that encode control flow: semirings support a single confluence operation, whereas NPA-PMA involves three confluence operations (conditional, probabilistic, and nondeterministic).

Our work introduces ω-continuous pre-Markov algebras (ωPMAs) to factor out common parts of different analyses; adopts regular infinite-tree expressions to encode probabilistic programs with unstructured control-flow; and presents a linearization method that makes Newton's method applicable to the setting of regular-infinite-tree equations over ωPMAs. NPA-PMA allows analyses to supply a non-iterative strategy to solve linearized equations. Our experimental evaluation demonstrates that (i) NPA-PMA holds considerable promise for outperforming Kleene iteration, and (ii) provides great generality for designing program analyses.

Identifying and Correcting Programming Language Behavior Misconceptions

Misconceptions about core linguistic concepts like mutable variables, mutable compound data, and their interaction with scope and higher-order functions seem to be widespread. But how do we detect them, given that experts have blind spots and may not realize the myriad ways in which students can misunderstand programs? Furthermore, once identified, what can we do to correct them?

In this paper, we present a curated list of misconceptions, and an instrument to detect them. These are distilled from student work over several years and match and extend prior research. We also present an automated, self-guided tutoring system. The tutor builds on strategies in the education literature and is explicitly designed around identifying and correcting misconceptions.

We have tested the tutor in multiple settings. Our data consistently show that (a) the misconceptions we tackle are widespread, and (b) the tutor appears to improve understanding.

Quantitative Bounds on Resource Usage of Probabilistic Programs

Cost analysis, also known as resource usage analysis, is the task of finding bounds on the total cost of a program and is a well-studied problem in static analysis. In this work, we consider two classical quantitative problems in cost analysis for probabilistic programs. The first problem is to find a bound on the expected total cost of the program. This is a natural measure for the resource usage of the program and can also be directly applied to average-case runtime analysis. The second problem asks for a tail bound, i.e. ‍given a threshold t the goal is to find a probability bound p such that ℙ[total cost ≥ t] ≤ p. Intuitively, given a threshold t on the resource, the problem is to find the likelihood that the total cost exceeds this threshold.

First, for expectation bounds, a major obstacle in previous works on cost analysis is that they can handle only non-negative costs or bounded variable updates. In contrast, we provide a new variant of the standard notion of cost martingales, that allows us to find expectation bounds for a class of programs with general positive or negative costs and no restriction on the variable updates. More specifically, our approach is applicable as long as there is a lower bound on the total cost incurred along every path.

Second, for tail bounds, all previous methods are limited to programs in which the expected total cost is finite. In contrast, we present a novel approach, based on a combination of our martingale-based method for expectation bounds with a quantitative safety analysis, to obtain a solution to the tail bound problem that is applicable even to programs with infinite expected cost. Specifically, this allows us to obtain runtime tail bounds for programs that do not terminate almost-surely.

In summary, we provide a novel combination of martingale-based cost analysis and quantitative safety analysis that is able to find expectation and tail cost bounds for probabilistic programs, without the restrictions of non-negative costs, bounded updates, or finiteness of the expected total cost. Finally, we provide experimental results showcasing that our approach can solve instances that were beyond the reach of previous methods.

CYCLE: Learning to Self-Refine the Code Generation

Pre-trained code language models have achieved promising performance in code generation and improved the programming efficiency of human developers. However, their self-refinement capability is typically overlooked by the existing evaluations of code LMs, which focus only on the accuracy of the one-time prediction. For the cases when code LMs fail to implement the correct program, developers actually find it hard to debug and fix the faulty prediction since it is not written by the developers themselves. Unfortunately, our study reveals that code LMs cannot efficiently self-refine their faulty generations as well.

In this paper, we propose CYCLE framework, learning to self-refine the faulty generation according to the available feedback, such as the execution results reported by the test suites. We evaluate CYCLE on three popular code generation benchmarks, HumanEval, MBPP, and APPS. The results reveal that CYCLE successfully maintains, sometimes improves, the quality of one-time code generation, while significantly improving the self-refinement capability of code LMs. We implement four variants of CYCLE with varied numbers of parameters across 350M, 1B, 2B, and 3B, and the experiments show that CYCLE consistently boosts the code generation performance, by up to 63.5

AdoB: Bridging Benign and Byzantine Consensus with Atomic Distributed Objects

Achieving consensus is a challenging and ubiquitous problem in distributed systems that is only made harder by the introduction of malicious byzantine servers. While significant effort has been devoted to the benign and byzantine failure models individually, no prior work has considered the mechanized verification of both in a generic way. We claim this is due to the lack of an appropriate abstraction that is capable of representing both benign and byzantine consensus without either losing too much detail or becoming impractically complex. We build on recent work on the atomic distributed object model to fill this void with a novel abstraction called AdoB. In addition to revealing important insights into the essence of consensus, this abstraction has practical benefits for easing distributed system verification. As a case study, we proved safety and liveness properties for AdoB in Coq, which are the first such mechanized proofs to handle benign and byzantine consensus in a unified manner. We also demonstrate that AdoB faithfully models real consensus protocols by proving it is refined by standard network-level specifications of Fast Paxos and a variant of Jolteon.

PROMPT: A Fast and Extensible Memory Profiling Framework

Memory profiling captures programs’ dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique program trace summary, various memory profiler types have been developed. Yet, designing practical memory profilers often requires extensive compiler expertise, adeptness in program optimization, and significant implementation effort. This often results in a void where aspirations for fast and robust profilers remain unfulfilled. To bridge this gap, this paper presents PROMPT, a framework for streamlined development of fast memory profilers. With PROMPT, developers need only specify profiling events and define the core profiling logic, bypassing the complexities of custom instrumentation and intricate memory profiling components and optimizations. Two state-of-the-art memory profilers were ported with PROMPT where all features preserved. By focusing on the core profiling logic, the code was reduced by more than 65% and the profiling overhead was improved by 5.3× and 7.1× respectively. To further underscore PROMPT’s impact, a tailored memory profiling workflow was constructed for a sophisticated compiler optimization client. In 570 lines of code, this redesigned workflow satisfies the client’s memory profiling needs while achieving more than 90% reduction in profiling overhead and improved robustness compared to the original profilers.

Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach

While static analysis is instrumental in uncovering software bugs, its precision in analyzing large and intricate codebases remains challenging. The emerging prowess of Large Language Models (LLMs) offers a promising avenue to address these complexities. In this paper, we present LLift, a pioneering framework that synergizes static analysis and LLMs, with a spotlight on identifying use-before-initialization (UBI) bugs within the Linux kernel. Drawing from our insights into variable usage conventions in Linux, we enhance path analysis using post-constraint guidance. This approach, combined with our methodically crafted procedures, empowers LLift to adeptly handle the challenges of bug-specific modeling, extensive codebases, and the unpredictable nature of LLMs. Our real-world evaluations identified four previously undiscovered UBI bugs in the mainstream Linux kernel, which the Linux community has acknowledged. This study reaffirms the potential of marrying static analysis with LLMs, setting a compelling direction for future research in this area.

Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis Tasks

While deep neural networks provide state-of-the-art solutions to a wide range of programming language tasks, their effectiveness in dealing with foundational program analysis tasks remains under explored. In this paper, we present an empirical study that evaluates four prominent models of code (i.e., CuBERT, CodeBERT, GGNN, and Graph Sandwiches) in two such foundational tasks: (1) alias prediction, in which models predict whether two pointers must alias, may alias or must not alias; and (2) equivalence prediction, in which models predict whether or not two programs are semantically equivalent. At the core of this study is CodeSem, a dataset built upon the source code of real-world flagship software (e.g., Linux Kernel, GCC, MySQL) and manually validated for the two prediction tasks. Results show that all models are accurate in both prediction tasks, especially CuBERT with an accuracy of 89% and 84% in alias prediction and equivalence prediction, respectively. We also conduct a comprehensive, in-depth analysis of the results of all models in both tasks, concluding that deep learning models are generally capable of performing foundational tasks in program analysis even though in specific cases their weaknesses are also evident.

Our code and evaluation data are publicly available at

Inductive Diagrams for Causal Reasoning

The Lamport diagram is a pervasive and intuitive tool for informal reasoning about “happens-before” relationships in a concurrent system. However, traditional axiomatic formalizations of Lamport diagrams can be painful to work with in a mechanized setting like Agda. We propose an alternative, inductive formalization — the causal separation diagram (CSD) — that takes inspiration from string diagrams and concurrent separation logic, but enjoys a graphical syntax similar to Lamport diagrams. Critically, CSDs are based on the idea that causal relationships between events are witnessed by the paths that information follows between them. To that end, we model “happens-before” as a dependent type of paths between events.

The inductive formulation of CSDs enables their interpretation into a variety of semantic domains. We demonstrate the interpretability of CSDs with a case study on properties of logical clocks, widely-used mechanisms for reifying causal relationships as data. We carry out this study by implementing a series of interpreters for CSDs, culminating in a generic proof of Lamport’s clock condition that is parametric in a choice of clock. We instantiate this proof on Lamport’s scalar clock, on Mattern’s vector clock, and on the matrix clocks of Raynal et al. and of Wuu and Bernstein, yielding verified implementations of each. The CSD formalism and our case study are mechanized in the Agda proof assistant.

Quarl: A Learning-Based Quantum Circuit Optimizer

Optimizing quantum circuits is challenging due to the very large search space of functionally equivalent circuits and the necessity of applying transformations that temporarily decrease performance to achieve a final performance improvement. This paper presents Quarl, a learning-based quantum circuit optimizer. Applying reinforcement learning (RL) to quantum circuit optimization raises two main challenges: the large and varying action space and the non-uniform state representation. Quarl addresses these issues with a novel neural architecture and RL-training procedure. Our neural architecture decomposes the action space into two parts and leverages graph neural networks in its state representation, both of which are guided by the intuition that optimization decisions can be mostly guided by local reasoning while allowing global circuit-wide reasoning. Our evaluation shows that Quarl significantly outperforms existing circuit optimizers on almost all benchmark circuits. Surprisingly, Quarl can learn to perform rotation merging—a complex, non-local circuit optimization implemented as a separate pass in existing optimizers.

Qualifying System F<:: Some Terms and Conditions May Apply

Type qualifiers offer a lightweight mechanism for enriching existing type systems to enforce additional, desirable, program invariants. They do so by offering a restricted but effective form of subtyping. While the theory of type qualifiers is well understood and present in many programming languages today, polymorphism over type qualifiers remains an area less well examined. We explore how such a polymorphic system could arise by constructing a calculus, System F-sub-Q, which combines the higher-rank bounded polymorphism of System F-sub with the theory of type qualifiers. We explore how the ideas used to construct System F-sub-Q can be reused in situations where type qualifiers naturally arise---in reference immutability, function colouring, and capture checking. Finally, we re-examine other qualifier systems in the literature in light of the observations presented while developing System F-sub-Q.

Forge: A Tool and Language for Teaching Formal Methods

This paper presents the design of Forge, a tool for teaching formal methods gradually. Forge is based on the widely-used Alloy language and analysis tool, but contains numerous improvements based on more than a decade of experience teaching Alloy to students. Although our focus has been on the classroom, many of the ideas in Forge likely also apply to training in industry.

Forge offers a progression of languages that improve the learning experience by only gradually increasing in expressive power. Forge supports custom visualization of its outputs, enabling the use of widely-understood domain-specific representations. Finally, Forge provides a variety of testing features to ease the transition from programming to formal modeling. We present the motivation for and design of these aspects of Forge, and then provide a substantial evaluation based on multiple years of classroom use.

Design and Implementation of an Aspect-Oriented C Programming Language

Aspect-Oriented Programming (AOP) is a programming paradigm that implements crosscutting concerns in a modular way. People have witnessed the prosperity of AOP languages for Java and C++, such as AspectJ and AspectC++, which has propelled AOP to become an important programming paradigm with many interesting application scenarios, e.g., runtime verification. In contrast, the AOP languages for C are still poor and lack compiler support. In this paper, we design a new general-purpose and expressive aspect-oriented C programming language, namely Aclang, and implement a compiler for it, which brings fully-fledged AOP support into the C domain. We have evaluated the effectiveness and performance of our compiler against two state-of-the-art tools, ACC and AspectC++. In terms of effectiveness, Aclang outperforms ACC and AspectC++. In terms of performance, Aclang outperforms ACC in execution time and outperforms AspectC++ in both execution time and memory consumption.

Cedar: A New Language for Expressive, Fast, Safe, and Analyzable Authorization

Cedar is a new authorization policy language designed to be ergonomic, fast, safe, and analyzable. Rather than embed authorization logic in an application’s code, developers can write that logic as Cedar policies and delegate access decisions to Cedar’s evaluation engine. Cedar’s simple and intuitive syntax supports common authorization use-cases with readable policies, naturally leveraging concepts from role-based, attribute-based, and relation-based access control models. Cedar’s policy structure enables access requests to be decided quickly. Cedar’s policy validator leverages optional typing to help policy writers avoid mistakes, but not get in their way. Cedar’s design has been finely balanced to allow for a sound and complete logical encoding, which enables precise policy analysis, e.g., to ensure that when refactoring a set of policies, the authorized permissions do not change. We have modeled Cedar in the Lean programming language, and used Lean’s proof assistant to prove important properties of Cedar’s design. We have implemented Cedar in Rust, and released it open-source. Comparing Cedar to two open-source languages, OpenFGA and Rego, we find (subjectively) that Cedar has equally or more readable policies, but (objectively) performs far better.

Persimmon: Nested Family Polymorphism with Extensible Variant Types

Many obstacles stand in the way of modular, extensible code. Some language constructs, such as pattern matching, are not easily extensible. Inherited code may not be type safe in the presence of extended types. The burden of setting up design patterns can discourage users, and parameter clutter can make the code less readable. Given these challenges, it is no wonder that extensibility often gives way to code duplication. We present our solution: Persimmon, a functional system with nested family polymorphism, extensible variant types, and extensible pattern matching. Most constructs in our language are built-in "extensibility hooks," cutting down on the parameter clutter and user burden associated with extensible code. Persimmon preserves the relationships between nested families upon inheritance, enabling extensibility at a large scale. Since nested family polymorphism can express composable extensions, Persimmon supports mixins via an encoding. We show how Persimmon can be compiled into a functional language without extensible variants with our translation to Scala. Finally, we show that our system is sound by proving the properties of progress and preservation.

Hydra: Generalizing Peephole Optimizations with Program Synthesis

Optimizing compilers rely on peephole optimizations to simplify combinations of instructions and remove redundant instructions. Typically, a new peephole optimization is added when a compiler developer notices an optimization opportunity---a collection of dependent instructions that can be improved---and manually derives a more general rewrite rule that optimizes not only the original code, but also other, similar collections of instructions. In this paper, we present Hydra, a tool that automates the process of generalizing peephole optimizations using a collection of techniques centered on program synthesis. One of the most important problems we have solved is finding a version of each optimization that is independent of the bitwidths of the optimization's inputs (when this version exists). We show that Hydra can generalize 75% of the ungeneralized missed peephole optimizations that LLVM developers have posted to the LLVM project's issue tracker. All of Hydra's generalized peephole optimizations have been formally verified, and furthermore we can automatically turn them into C++ code that is suitable for inclusion in an LLVM pass.

Multiverse Notebook: Shifting Data Scientists to Time Travelers

Computational notebook environments are popular and de facto standard tools for programming in data science, whereas computational notebooks are notorious in software engineering. The criticism there stems from the characteristic of facilitating unrestricted dynamic patching of running programs, which makes exploratory coding quick but the resultant code messy and inconsistent. In this work, we first reveal that dynamic patching is a natural demand rather than a mere bad practice in data science programming on Kaggle. We then develop Multiverse Notebook, a computational notebook engine for time-traveling exploration. It enables users to time-travel to any past state and restart with new code from there under state isolation. We present an approach to efficiently implementing time-traveling exploration. We empirically evaluate Multiverse Notebook on ten real-world tasks from Kaggle. Our experiments show that time-traveling exploration on Multiverse Notebook is reasonably efficient.

Hopping Proofs of Expectation-Based Properties: Applications to Skiplists and Security Proofs

We propose, implement, and evaluate a hopping proof approach for proving expectation-based properties of probabilistic programs. Our approach combines EHL, a syntax-directed proof system for reducing proof goals of a program to proof goals of simpler programs, with a "hopping" proof rule for reducing proof goals of an original program to proof goal of a different program which is suitably related (by means of pRHL, a relational program logic for probabilistic program) to the original program. We prove that EHL is sound for a core language with procedure calls and adversarial computations, and complete for the adversary-free fragment of the language. We also provide an implementation of EHL into EasyCrypt, a proof assistant tailored for reasoning about relational properties of probabilistic programs. We provide a tight integration of EHL with other program logics supported by EasyCrypt, and in particular probabilistic Relational Hoare Logic (pRHL). Using this tight integration, we give mechanized proofs of expected complexity of in-place implementations of randomized quickselect and skip lists. We also sketch applications of our approach to cryptographic proofs and discuss the broader impact of EHL in the EasyCrypt proof assistant.

Accurate Data Race Prediction in the Linux Kernel through Sparse Fourier Learning

Testing for data races in the Linux OS kernel is challenging because there is an exponentially large space of system calls and thread interleavings that can potentially lead to concurrent executions with races. In this work, we introduce a new approach for modeling execution trace feasibility and apply it to Linux OS Kernel race prediction. To address the fundamental scalability challenge posed by the exponentially large domain of possible execution traces, we decompose the task of predicting trace feasibility into independent prediction subtasks encoded as learning Boolean indicator functions for specific memory accesses, and apply a sparse fourier learning approach to learning each feasibility subtask.

Boolean functions that are sparse in their fourier domain can be efficiently learned by estimating the coefficients of their fourier expansion. Since the feasibility of each memory access depends on only a few other relevant memory accesses or system calls (e.g., relevant inter-thread communications), we observe that trace feasibility functions often have this sparsity property and can be learned efficiently. We use learned trace feasibility functions in conjunction with conservative alias analysis to implement a kernel race-testing system, HBFourier, that uses sparse fourier learning to efficiently model feasibility when making predictions. We evaluate our approach on a recent Linux development kernel and show it finds 44 more races with 15.7% more accurate race predictions than the next best performing system in our evaluation, in addition to identifying 5 new race bugs confirmed by kernel developers.

TorchQL: A Programming Framework for Integrity Constraints in Machine Learning

Finding errors in machine learning applications requires a thorough exploration of their behavior over data. Existing approaches used by practitioners are often ad-hoc and lack the abstractions needed to scale this process. We present TorchQL, a programming framework to evaluate and improve the correctness of machine learning applications. TorchQL allows users to write queries to specify and check integrity constraints over machine learning models and datasets. It seamlessly integrates relational algebra with functional programming to allow for highly expressive queries using only eight intuitive operators. We evaluate TorchQL on diverse use-cases including finding critical temporal inconsistencies in objects detected across video frames in autonomous driving, finding data imputation errors in time-series medical records, finding data labeling errors in real-world images, and evaluating biases and constraining outputs of language models. Our experiments show that TorchQL enables up to 13x faster query executions than baselines like Pandas and MongoDB, and up to 40% shorter queries than native Python. We also conduct a user study and find that TorchQL is natural enough for developers familiar with Python to specify complex integrity constraints.

Gradually Typed Languages Should Be Vigilant!

In gradual typing, different languages perform different dynamic type checks for the same program even though the languages have the same static type system. This raises the question of whether, given a gradually typed language, the combination of the translation that injects checks in well-typed terms and the dynamic semantics that determines their behavior sufficiently enforce the static type system of the language. Neither type soundness, nor complete monitoring, nor any other meta-theoretic property of gradually typed languages to date provides a satisfying answer.

In response, we present vigilance, a semantic analytical instrument that defines when the check-injecting translation and dynamic semantics of a gradually typed language are adequate for its static type system. Technically, vigilance asks if a given translation-and-semantics combination enforces the complete run-time typing history of a value, which consists of all of the types associated with the value. We show that the standard combination for so-called Natural gradual typing is vigilant for the standard simple type system, but the standard combination for Transient gradual typing is not. At the same time, the standard combination for Transient is vigilant for a tag type system but the standard combination for Natural is not. Hence, we clarify the comparative type-level reasoning power between the two most studied approaches to sound gradual typing. Furthermore, as an exercise that demonstrates how vigilance can guide design, we introduce and examine a new theoretical static gradual type system, dubbed truer, that is stronger than tag typing and more faithfully reflects the type-level reasoning power that the dynamic semantics of Transient gradual typing can guarantee.

Distributions for Compositionally Differentiating Parametric Discontinuities

Computations in physical simulation, computer graphics, and probabilistic inference often require the differentiation of discontinuous processes due to contact, occlusion, and changes at a point in time. Popular differentiable programming languages, such as PyTorch and JAX, ignore discontinuities during differentiation. This is incorrect for parametric discontinuities—conditionals containing at least one real-valued parameter and at least one variable of integration. We introduce Potto, the first differentiable first-order programming language to soundly differentiate parametric discontinuities. We present a denotational semantics for programs and program derivatives and show the two accord. We describe the implementation of Potto, which enables separate compilation of programs. Our prototype implementation overcomes previous compile-time bottlenecks achieving an 88.1x and 441.2x speed up in compile time and a 2.5x and 7.9x speed up in runtime, respectively, on two increasingly large image stylization benchmarks. We showcase Potto by implementing a prototype differentiable renderer with separately compiled shaders.

Exact Bayesian Inference for Loopy Probabilistic Programs using Generating Functions

We present an exact Bayesian inference method for inferring posterior distributions encoded by probabilistic programs featuring possibly unbounded loops. Our method is built on a denotational semantics represented by probability generating functions, which resolves semantic intricacies induced by intertwining discrete probabilistic loops with conditioning (for encoding posterior observations). We implement our method in a tool called Prodigy; it augments existing computer algebra systems with the theory of generating functions for the (semi-)automatic inference and quantitative verification of conditioned probabilistic programs. Experimental results show that Prodigy can handle various infinite-state loopy programs and exhibits comparable performance to state-of-the-art exact inference tools over loop-free benchmarks.

Learning Abstraction Selection for Bayesian Program Analysis

We propose a learning-based approach to select abstractions for Bayesian program analysis. Bayesian program analysis converts a program analysis into a Bayesian model by attaching probabilities to analysis rules. It computes probabilities of analysis results and can update them by learning from user feedback, test runs, and other information. Its abstraction heavily affects how well it learns from such information. There exists a long line of works in selecting abstractions for conventional program analysis but they are not effective for Bayesian program analysis. This is because they do not optimize for generalization ability. We propose a data-driven framework to solve this problem by learning from labeled programs. Starting from an abstraction, it decides how to change the abstraction based on analysis derivations. To be general, it considers graph properties of analysis derivations; to be effective, it considers the derivations before and after changing the abstraction. We demonstrate the effectiveness of our approach using a datarace analysis and a thread-escape analysis.

Deriving Dependently-Typed OOP from First Principles

The expression problem describes how most types can easily be extended with new ways to produce the type or new ways to consume the type, but not both. When abstract syntax trees are defined as an algebraic data type, for example, they can easily be extended with new consumers, such as print or eval, but adding a new constructor requires the modification of all existing pattern matches. The expression problem is one way to elucidate the difference between functional or data-oriented programs (easily extendable by new consumers) and object-oriented programs (easily extendable by new producers). This difference between programs which are extensible by new producers or new consumers also exists for dependently typed programming, but with one core difference: Dependently-typed programming almost exclusively follows the functional programming model and not the object-oriented model, which leaves an interesting space in the programming language landscape unexplored. In this paper, we explore the field of dependently-typed object-oriented programming by deriving it from first principles using the principle of duality. That is, we do not extend an existing object-oriented formalism with dependent types in an ad-hoc fashion, but instead start from a familiar data-oriented language and derive its dual fragment by the systematic use of defunctionalization and refunctionalization. Our central contribution is a dependently typed calculus which contains two dual language fragments. We provide type- and semantics-preserving transformations between these two language fragments: defunctionalization and refunctionalization. We have implemented this language and these transformations and use this implementation to explain the various ways in which constructions in dependently typed programming can be explained as special instances of the general phenomenon of duality.

Verification of Neural Networks’ Global Robustness

Neural networks are successful in various applications but are also susceptible to adversarial attacks. To show the safety of network classifiers, many verifiers have been introduced to reason about the local robustness of a given input to a given perturbation. While successful, local robustness cannot generalize to unseen inputs. Several works analyze global robustness properties, however, neither can provide a precise guarantee about the cases where a network classifier does not change its classification. In this work, we propose a new global robustness property for classifiers aiming at finding the minimal globally robust bound, which naturally extends the popular local robustness property for classifiers. We introduce VHAGaR, an anytime verifier for computing this bound. VHAGaR relies on three main ideas: encoding the problem as a mixed-integer programming and pruning the search space by identifying dependencies stemming from the perturbation or the network's computation and generalizing adversarial attacks to unknown inputs. We evaluate VHAGaR on several datasets and classifiers and show that, given a three hour timeout, the average gap between the lower and upper bound on the minimal globally robust bound computed by VHAGaR is 1.9, while the gap of an existing global robustness verifier is 154.7. Moreover, VHAGaR is 130.6x faster than this verifier. Our results further indicate that leveraging dependencies and adversarial attacks makes VHAGaR 78.6x faster.

Functional Ownership through Fractional Uniqueness

Ownership and borrowing systems, designed to enforce safe memory management without the need for garbage collection, have been brought to the fore by the Rust programming language. Rust also aims to bring some guarantees offered by functional programming into the realm of performant systems code, but the type system is largely separate from the ownership model, with type and borrow checking happening in separate compilation phases. Recent models such as RustBelt and Oxide aim to formalise Rust in depth, but there is less focus on integrating the basic ideas into more traditional type systems. An approach designed to expose an essential core for ownership and borrowing would open the door for functional languages to borrow concepts found in Rust and other ownership frameworks, so that more programmers can enjoy their benefits.

One strategy for managing memory in a functional setting is through uniqueness types, but these offer a coarse-grained view: either a value has exactly one reference, and can be mutated safely, or it cannot, since other references may exist. Recent work demonstrates that linear and uniqueness types can be combined in a single system to offer restrictions on program behaviour and guarantees about memory usage. We develop this connection further, showing that just as graded type systems like those of Granule and Idris generalise linearity, a Rust-like ownership model arises as a graded generalisation of uniqueness. We combine fractional permissions with grading to give the first account of ownership and borrowing that smoothly integrates into a standard type system alongside linearity and graded types, and extend Granule accordingly with these ideas.

VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints

The task of SQL query equivalence checking is important in various real-world applications (including query rewriting and automated grading) that involve complex queries with integrity constraints; yet, state-of-the-art techniques are very limited in their capability of reasoning about complex features (e.g., those that involve sorting, case statement, rich integrity constraints, etc.) in real-life queries. To the best of our knowledge, we propose the first SMT-based approach and its implementation, VeriEQL, capable of proving and disproving bounded equivalence of complex SQL queries. VeriEQL is based on a new logical encoding that models query semantics over symbolic tuples using the theory of integers with uninterpreted functions. It is simple yet highly practical -- our comprehensive evaluation on over 20,000 benchmarks shows that VeriEQL outperforms all state-of-the-art techniques by more than one order of magnitude in terms of the number of benchmarks that can be proved or disproved. VeriEQL can also generate counterexamples that facilitate many downstream tasks (such as finding serious bugs in systems like MySQL and Apache Calcite).

PyDex: Repairing Bugs in Introductory Python Assignments using LLMs

Students often make mistakes in their introductory programming assignments as part of their learning process. Unfortunately, providing custom repairs for these mistakes can require a substantial amount of time and effort from class instructors. Automated program repair (APR) techniques can be used to synthesize such fixes. Prior work has explored the use of symbolic and neural techniques for APR in the education domain. Both types of approaches require either substantial engineering efforts or large amounts of data and training. We propose to use a large language model trained on code, such as Codex (a version of GPT), to build an APR system -- PyDex -- for introductory Python programming assignments. Our system can fix both syntactic and semantic mistakes by combining multi-modal prompts, iterative querying, test-case-based selection of few-shots, and program chunking. We evaluate PyDex on 286 real student programs and compare to three baselines, including one that combines a state-of-the-art Python syntax repair engine, BIFI, and a state-of-the-art Python semantic repair engine for student assignments, Refactory. We find that PyDex can fix more programs and produce smaller patches on average.

Seneca: Taint-Based Call Graph Construction for Java Object Deserialization

Object serialization and deserialization are widely used for storing and preserving objects in files, memory, or database as well as for transporting them across machines, enabling remote interaction among processes and many more. This mechanism relies on reflection, a dynamic language that introduces serious challenges for static analyses. Current state-of-the-art call graph construction algorithms do not fully support object serialization/deserialization, i.e., they are unable to uncover the callback methods that are invoked when objects are serialized and deserialized. Since call graphs are a core data structure for multiple types of analysis (e.g., vulnerability detection), an appropriate analysis cannot be performed since the call graph does not capture hidden (vulnerable) paths that occur via callback methods. In this paper, we present Seneca, an approach for handling serialization with improved soundness in the context of call graph construction. Our approach relies on taint analysis and API modeling to construct sound call graphs. We evaluated our approach with respect to soundness, precision, performance, and usefulness in detecting untrusted object deserialization vulnerabilities. Our results show that Seneca can create sound call graphs with respect to serialization features. The resulting call graphs do not incur significant runtime overhead and were shown to be useful for performing identification of vulnerable paths caused by untrusted object deserialization.

A Pure Demand Operational Semantics with Applications to Program Analysis

This paper develops a novel minimal-state operational semantics for higher-order functional languages that uses only the call stack and a source program point or a lexical level as the complete state information: there is no environment, no substitution, no continuation, etc. We prove this form of operational semantics equivalent to standard presentations.

We then show how this approach can open the door to potential new applications: we define a program analysis as a direct finitization of this operational semantics. The program analysis that naturally emerges has a number of novel and interesting properties compared to standard program analyses for higher-order programs: for example, it can infer recurrences and does not need value widening. We both give a formal definition of the analysis and describe our current implementation.

Degrees of Separation: A Flexible Type System for Safe Concurrency

Data races have long been a notorious problem in concurrent programming. They are hard to detect, and lead to non-deterministic behaviours. There has been a lot of interest in type systems that statically guarantee data race freedom. Significant progress has been made in this area, and these type systems are increasingly usable and practical. However, their adoption in mainstream programming languages is still limited, which is largely attributed to their strict alias prevention principles that obstruct the usage of existing programming patterns. This is a deterrent to the migration of existing code bases. To tackle this problem, we propose Capture Separation Calculus (System CSC), a calculus that models fork-join parallelism and statically prevents data races while being compatible with established programming patterns. It follows a control-as-you-need philosophy: by default, aliases are allowed, but they are tracked in the type system. When data races are a concern, the tracked aliases are controlled to prevent data-race-prone patterns. We study the formal properties of System CSC. Type soundness is proven via the standard progress and preservation theorems. Additionally, we formally verify the data race freedom property of System CSC by proving that the reduction of a well-typed program is confluent.

ParDiff: Practical Static Differential Analysis of Network Protocol Parsers

Countless devices all over the world are connected by networks and communicated via network protocols. Just like common software, protocol implementations suffer from bugs, many of which only cause silent data corruption instead of crashes. Hence, existing automated bug-finding techniques focused on memory safety, such as fuzzing, can hardly detect them. In this work, we propose a static differential analysis called ParDiff to find protocol implementation bugs, especially silent ones hidden in message parsers. Our key observation is that a network protocol often has multiple implementations and any semantic discrepancy between them may indicate bugs. However, different implementations are often written in disparate styles, e.g., using different data structures or written with different control structures, making it challenging to directly compare two implementations of even the same protocol. To exploit this observation and effectively compare multiple protocol implementations, ParDiff (1) automatically extracts finite state machines from programs to represent protocol format specifications, and (2) then leverages bisimulation and SMT solvers to find fine-grained and semantic inconsistencies between them. We have extensively evaluated ParDiff using 14 network protocols. The results show that ParDiff outperforms both differential symbolic execution and differential fuzzing tools. To date, we have detected 41 bugs with 25 confirmed by developers.

A Constraint Solving Approach to Parikh Images of Regular Languages

A common problem in string constraint solvers is computing the Parikh image, a linear arithmetic formula that describes all possible combinations of character counts in strings of a given language. Automata-based string solvers frequently need to compute the Parikh image of products (or intersections) of finite-state automata, in particular when solving string constraints that also include the integer data-type due to operations like string length and indexing. In this context, the computation of Parikh images often turns out to be both prohibitively slow and memory-intensive. This paper contributes a new understanding of how the reasoning about Parikh images can be cast as a constraint solving problem, and questions about Parikh images be answered without explicitly computing the product automaton or the exact Parikh image. The paper shows how this formulation can be efficiently implemented as a calculus, PC*, embedded in an automated theorem prover supporting Presburger logic. The resulting standalone tool Catra is evaluate on constraints produced by the Ostrich+ string solver when solving standard string constraint benchmarks involving integer operations. The experiments show that PC* strictly outperforms the standard approach by Verma et al. to extract Parikh images from finite-state automata, as well as the over-approximating method recently described by Janků and Turoňová by a wide margin, and for realistic timeouts (under 60 s) also the nuXmv model checker. When added as the Parikh image backend of Ostrich+ to the Ostrich string constraint solver’s portfolio, it boosts its results on the quantifier-free strings with linear integer algebra track of SMT-COMP 2023 (QF_SLIA) enough to solve the most Unsat instances in that track of all competitors.

PP-CSA: Practical Privacy-Preserving Software Call Stack Analysis

Software call stack is a sequence of function calls that are executed during the runtime of a software program. Software call stack analysis (CSA) is widely used in software engineering to analyze the runtime behavior of software, which can be used to optimize the software performance, identify bugs, and profile the software. Despite the benefits of CSA, it has recently come under scrutiny due to concerns about privacy. To date, software is often deployed at user-side devices like mobile phones and smart watches. The collected call stacks may thus contain privacy-sensitive information, such as healthy information or locations, depending on the software functionality. Leaking such information to third parties may cause serious privacy concerns such as discrimination and targeted advertisement.

This paper presents PP-CSA, a practical and privacy-preserving CSA framework that can be deployed in real-world scenarios. Our framework leverages local differential privacy (LDP) as a principled privacy guarantee, to mutate the collected call stacks and protect the privacy of individual users. Furthermore, we propose several key design principles and optimizations in the technical pipeline of PP-CSA, including an encoder-decoder scheme to properly enforce LDP over software call stacks, and several client/server-side optimizations to largely improve the efficiency of PP-CSA. Our evaluation over real-world Java and Android programs shows that our privacy-preserving CSA pipeline can achieve high utility and privacy guarantees while maintaining high efficiency. We have released our implementation of PP-CSA as an open-source project at for results reproducibility. We will provide more detailed documents to support and the usage and extension of the community.

Scenario-Based Proofs for Concurrent Objects

Concurrent objects form the foundation of many applications that exploit multicore architectures and their importance has lead to informal correctness arguments, as well as formal proof systems. Correctness arguments (as found in the distributed computing literature) give intuitive descriptions of a few canonical executions or "scenarios" often each with only a few threads, yet it remains unknown as to whether these intuitive arguments have a formal grounding and extend to arbitrary interleavings over unboundedly many threads.

We present a novel proof technique for concurrent objects, based around identifying a small set of scenarios (representative, canonical interleavings), formalized as the commutativity quotient of a concurrent object. We next give an expression language for defining abstractions of the quotient in the form of regular or context-free languages that enable simple proofs of linearizability. These quotient expressions organize unbounded interleavings into a form more amenable to reasoning and make explicit the relationship between implementation-level contention/interference and ADT-level transitions.

We evaluate our work on numerous non-trivial concurrent objects from the literature (including the Michael-Scott queue, Elimination stack, SLS reservation queue, RDCSS and Herlihy-Wing queue). We show that quotients capture the diverse features/complexities of these algorithms, can be used even when linearization points are not straight-forward, correspond to original authors' correctness arguments, and provide some new scenario-based arguments. Finally, we show that discovery of some object's quotients reduces to two-thread reasoning and give an implementation that can derive candidate quotients expressions from source code.

Mechanizing the CMP Abstraction for Parameterized Verification

Parameterized verification is a challenging problem that is known to be undecidable in the general case.  ‍is a widely-used method for parameterized verification, originally proposed by Chou, Mannava and Park in 2004. It involves abstracting the protocol to a small fixed number of nodes, and strengthening by auxiliary invariants to refine the abstraction. In most of the existing applications of CMP, the abstraction and strengthening procedures are carried out manually, which can be tedious and error-prone. Existing theoretical justification of the  ‍method is also done at a high level, without detailed descriptions of abstraction and strengthening rules. In this paper, we present a formally verified theory of  ‍in Isabelle/HOL, with detailed, syntax-directed procedure for abstraction and strengthening that is proven correct. The formalization also includes correctness of symmetry reduction and assume-guarantee reasoning. We also describe a tool AutoCMP for automatically carrying out abstraction and strengthening in , as well as generating Isabelle proof scripts showing their correctness. We applied the tool to a number of parameterized protocols, and discovered some inaccuracies in previous manual applications of  ‍to the FLASH cache coherence protocol.

Message-Observing Sessions

We present Most, a process language with message-observing session types. Message-observing session types extend binary session types with type-level computation to specify communication protocols that vary based on messages observed on other channels. Hence, Most allows us to express global invariants about processes, rather than just local invariants, in a bottom-up, compositional way. We give Most a semantic foundation using traces with binding, a semantic approach for compositionally reasoning about traces in the presence of name generation. We use this semantics to prove type soundness and compositionality for Most processes. We see this as a significant step towards capturing message-dependencies and providing more precise guarantees about processes.

Understanding and Finding Java Decompiler Bugs

Java decompilers are programs that perform the reverse process of Java compilers, i.e., they translate Java bytecode to Java source code. They are essential for reverse engineering purposes and have become more sophisticated and reliable over the years. However, it remains challenging for modern Java decompilers to reliably perform correct decompilation on real-world programs. To shed light on the key challenges of Java decompilation, this paper provides the first systematic study on the characteristics and causes of bugs in mature, widely-used Java decompilers. We conduct the study by investigating 333 unique bugs from three popular Java decompilers. Our key findings and observations include: (1) Although most of the reported bugs were found when decompiling large, real-world code, 40.2% of them have small test cases for bug reproduction; (2) Over 80% of the bugs manifest as exceptions, syntactic errors, or semantic errors, and bugs with source code artifacts are very likely semantic errors; (3) 57.7%, 39.0%, and 41.1% of the bugs respectively are attributed to three stages of decompilers—loading structure entities from bytecode, optimizing these entities, and generating source code from these entities; (4) Bugs in decompilers’ type inference are the most complex to fix; and (5) Region restoration for structures like loop, sugaring for special structures like switch, and type inference of variables of generic types or indistinguishable types are the three most significant challenges in Java decompilation, which to some extent explains our findings in (3) and (4).

Based on these findings, we present JD-Tester, a differential testing framework for Java decompilers, and our experience of using it in testing the three popular Java decompilers. JD-Testerutilizes different Java program generators to construct executable Java tests and finds exceptions, syntactic, and semantic inconsistencies (i.e. bugs) between a generated test and its compiled-decompiled version (through compilation and execution). In total, we have found 62 bugs in the three decompilers, demonstrating both the effectiveness of JD-Tester, and the importance of testing and validating Java decompilers.

Taypsi: Static Enforcement of Privacy Policies for Policy-Agnostic Oblivious Computation

Secure multiparty computation (MPC) techniques enable multiple parties to compute joint functions over their private data without sharing that data with other parties, typically by employing powerful cryptographic protocols to protect individual's data. One challenge when writing such functions is that most MPC languages force users to intermix programmatic and privacy concerns in a single application, making it difficult to change or audit a program's underlying privacy policy. Prior policy-agnostic MPC languages relied on dynamic enforcement to decouple privacy requirements from program logic. Unfortunately, the resulting overhead makes it difficult to scale MPC applications that manipulate structured data. This work proposes to eliminate this overhead by instead transforming programs into semantically equivalent versions that statically enforce user-provided privacy policies. We have implemented this approach in a new MPC language, called Taypsi; our experimental evaluation demonstrates that the resulting system features considerable performance improvements on a variety of MPC applications involving structured data and complex privacy policies.

Iterative-Epoch Online Cycle Elimination for Context-Free Language Reachability

Context-free language reachability (CFL-reachability) is a fundamental framework for implementing various static analyses. CFL-reachability utilizes context-free grammar (CFG) to extend the expressiveness of ordinary graph reachability from an unlabeled graph to an edge-labeled graph. Solving CFL-reachability requires a (sub)cubic time complexity with respect to the graph size, which limits its scalability in practice. Thus, an approach that can effectively reduce the graph size while maintaining the reachability result is highly desirable. Most of the existing graph simplification techniques for CFL-reachability work during the preprocessing stage, i.e., before the dynamic CFL-reachability solving process. However, in real-world CFL-reachability analyses, there is a large number of reducible nodes and edges that can only be discovered during dynamic solving, leaving significant room for on-the-fly improvements.

This paper aims to reduce the graph size of CFL-reachability dynamically via online cycle elimination. We propose a simple yet effective approach to detect collapsible cycles in the graph based on the input context-free grammar. Our key insight is that symbols with particular forms of production rules in the grammar are the essence of transitivity of reachability relations in the graph. Specifically, in the graph, a reachability relation to a node v_i can be "transited" to another node v_j if there is a transitive relation from v_i to v_j, and cycles formed by transitive relations are collapsible. In this paper, we present an approach to identify the transitive symbols in a context-free grammar and propose an iterative-epoch framework for online cycle elimination. From the perspective of non-parallelized CFL-reachability solving, our iterative-epoch framework is well compatible with both the standard (unordered) solver and the recent ordered solver, and can significantly improve their performance. Our experiment on context-sensitive value-flow analysis for C/C++ and field-sensitive alias analysis for Java demonstrates promising performance improvement by our iterative-epoch cycle elimination technique. By collapsing cycles online, our technique accelerates the standard solver by 17.17× and 13.94× for value-flow analysis and alias analysis, respectively, with memory reductions of 48.8% and 45.0%. Besides, our technique can also accelerate the ordered solver by 14.32× and 8.36× for value-flow analysis and alias analysis, respectively, with memory reductions of 55.2% and 57.8%.

Modeling Dynamic (De)Allocations of Local Memory for Translation Validation

End-to-End Translation Validation is the problem of verifying the executable code generated by a compiler against the corresponding input source code for a single compilation. This becomes particularly hard in the presence of dynamically-allocated local memory where addresses of local memory may be observed by the program. In the context of validating the translation of a C procedure to executable code, a validator needs to tackle constant-length local arrays, address-taken local variables, address-taken formal parameters, variable-length local arrays, procedure-call arguments (including variadic arguments), and the alloca() operator. We provide an execution model, a definition of refinement, and an algorithm to soundly convert a refinement check into first-order logic queries that an off-the-shelf SMT solver can handle efficiently. In our experiments, we perform blackbox translation validation of C procedures (with up to 100+ SLOC), involving these local memory allocation constructs, against their corresponding assembly implementations (with up to 200+ instructions) generated by an optimizing compiler with complex loop and vectorizing transformations.