Proceedings of the ACM on Programming Languages: Vol. 6, No. OOPSLA1. 2022

Full Citation in the ACM Digital Library


Weighted programming: a programming paradigm for specifying mathematical models

We study weighted programming, a programming paradigm for specifying mathematical models. More specifically, the weighted programs we investigate are like usual imperative programs with two additional features: (1) nondeterministic branching and (2) weighting execution traces. Weights can be numbers but also other objects like words from an alphabet, polynomials, formal power series, or cardinal numbers. We argue that weighted programming as a paradigm can be used to specify mathematical models beyond probability distributions (as is done in probabilistic programming). We develop weakest-precondition- and weakest-liberal-precondition-style calculi à la Dijkstra for reasoning about mathematical models specified by weighted programs. We present several case studies. For instance, we use weighted programming to model the ski rental problem — an optimization problem. We model not only the optimization problem itself, but also the best deterministic online algorithm for solving this problem as weighted programs. By means of weakest-precondition-style reasoning, we can determine the competitive ratio of the online algorithm on source code level.

Synthesizing fine-grained synchronization protocols for implicit monitors

A monitor is a widely-used concurrent programming abstraction that encapsulates all shared state between threads. Monitors can be classified as being either implicit or explicit depending on the primitives they provide. Implicit monitors are much easier to program but typically not as efficient. To address this gap, there has been recent research on automatically synthesizing explicit-signal monitors from an implicit specification, but prior work does not exploit all paralellization opportunities due to the use of a single lock for the entire monitor. This paper presents a new technique for synthesizing fine-grained explicit-synchronization protocols from implicit monitors. Our method is based on two key innovations: First, we present a new static analysis for inferring safe interleavings that allow violating mutual exclusion of monitor operations without changing its semantics. Second, we use the results of this static analysis to generate a MaxSAT instance whose models correspond to correct-by-construction synchronization protocols. We have implemented our approach in a tool called Cortado and evaluate it on monitors that contain parallelization opportunities. Our evaluation shows that Cortado can synthesize synchronization policies that are competitive with, or even better than, expert-written ones on these benchmarks.

Complexity-guided container replacement synthesis

Containers, such as lists and maps, are fundamental data structures in modern programming languages. However, improper choice of container types may lead to significant performance issues. This paper presents Cres, an approach that automatically synthesizes container replacements to improve runtime performance. The synthesis algorithm works with static analysis techniques to identify how containers are utilized in the program, and attempts to select a method with lower time complexity for each container method call. Our approach can preserve program behavior and seize the opportunity of reducing execution time effectively for general inputs. We implement Cres and evaluate it on 12 real-world Java projects. It is shown that Cres synthesizes container replacements for the projects with 384.2 KLoC in 14 minutes and discovers six categories of container replacements, which can achieve an average performance improvement of 8.1%.

Linear types for large-scale systems verification

Reasoning about memory aliasing and mutation in software verification is a hard problem. This is especially true for systems using SMT-based automated theorem provers. Memory reasoning in SMT verification typically requires a nontrivial amount of manual effort to specify heap invariants, as well as extensive alias reasoning from the SMT solver. In this paper, we present a hybrid approach that combines linear types with SMT-based verification for memory reasoning. We integrate linear types into Dafny, a verification language with an SMT backend, and show that the two approaches complement each other. By separating memory reasoning from verification conditions, linear types reduce the SMT solving time. At the same time, the expressiveness of SMT queries extends the flexibility of the linear type system. In particular, it allows our linear type system to easily and correctly mix linear and nonlinear data in novel ways, encapsulating linear data inside nonlinear data and vice-versa. We formalize the core of our extensions, prove soundness, and provide algorithms for linear type checking. We evaluate our approach by converting the implementation of a verified storage system (about 24K lines of code and proof) written in Dafny, to use our extended Dafny. The resulting system uses linear types for 91% of the code and SMT-based heap reasoning for the remaining 9%. We show that the converted system has 28% fewer lines of proofs and 30% shorter verification time overall. We discuss the development overhead in the original system due to SMT-based heap reasoning and highlight the improved developer experience when using linear types.

Plausible sealing for gradual parametricity

Graduality and parametricity have proven to be extremely challenging notions to bring together. Intuitively, enforcing parametricity gradually requires possibly sealing values in order to detect violations of uniform behavior. Toro et al. (2019) argue that the two notions are incompatible in the context of System F, where sealing is transparently driven by potentially imprecise type information, while New et al. (2020) reconcile both properties at the cost of abandoning the syntax of System F and requiring user-provided sealing annotations that are not subject to graduality guarantees. Furthermore, all current proposals rely on a global form of dynamic sealing in order to enforce parametric behavior at runtime, which weakens parametric reasoning and breaks equivalences in the static language. Based on the observation that the tension between graduality and parametricity comes from the early commitment to seal values based on type information, we propose plausible sealing as a new intermediate language mechanism that allows postponing such decisions to runtime. We propose an intermediate language for gradual parametricity, Funky, which supports plausible sealing in a simplified setting where polymorphism is restricted to instantiations with base and variable types. We prove that Funky satisfies both parametricity and graduality, mechanizing key lemmas in Agda. Additionally, we avoid global dynamic sealing and instead propose a novel lexically-scoped form of sealing realized using a representation of evidence inspired by the category of spans. As a consequence, Funky satisfies a standard formulation of parametricity that does not break System F equivalences. In order to show the practicality of plausible sealing, we describe a translation from Funk, a source language without explicit sealing, to Funky, that takes care of inserting plausible sealing forms. We establish graduality of Funk, subject to a restriction on type applications, and explain the source-level parametric reasoning it supports. Finally, we provide an interactive prototype along with illustrative examples both novel and from the literature.

Automated transpilation of imperative to functional code using neural-guided program synthesis

While many mainstream languages such as Java, Python, and C# increasingly incorporate functional APIs to simplify programming and improve parallelization/performance, there are no effective techniques that can be used to automatically translate existing imperative code to functional variants using these APIs. Motivated by this problem, this paper presents a transpilation approach based on inductive program synthesis for modernizing existing code. Our method is based on the observation that the overwhelming majority of source/target programs in this setting satisfy an assumption that we call trace-compatibility: not only do the programs share syntactically identical low-level expressions, but these expressions also take the same values in corresponding execution traces. Our method leverages this observation to design a new neural-guided synthesis algorithm that (1) uses a novel neural architecture called cognate grammar network (CGN) and (2) leverages a form of concolic execution to prune partial programs based on intermediate values that arise during a computation. We have implemented our approach in a tool called NGST2 and use it to translate imperative Java and Python code to functional variants that use the Stream and functools APIs respectively. Our experiments show that NGST2 significantly outperforms several baselines and that our proposed neural architecture and pruning techniques are vital for achieving good results.

On incorrectness logic for Quantum programs

Bug-catching is important for developing quantum programs. Motivated by the incorrectness logic for classical programs, we propose an incorrectness logic towards a logical foundation for static bug-catching in quantum programming. The validity of formulas in this logic is dual to that of quantum Hoare logics. We justify the formulation of validity by an intuitive explanation from a reachability point of view and a comparison against several alternative formulations. Compared with existing works focusing on dynamic analysis, our logic provides sound and complete arguments. We further demonstrate the usefulness of the logic by reasoning several examples, including Grover's search, quantum teleportation, and a repeat-until-success program. We also automate the reasoning procedure by a prototyped static analyzer built on top of the logic rules.

Coverage-guided tensor compiler fuzzing with joint IR-pass mutation

In the past decade, Deep Learning (DL) systems have been widely deployed in various application domains to facilitate our daily life, e.g., natural language processing, healthcare, activity recognition, and autonomous driving. Meanwhile, it is extremely challenging to ensure the correctness of DL systems (e.g., due to their intrinsic nondeterminism), and bugs in DL systems can cause serious consequences and may even threaten human lives. In the literature, researchers have explored various techniques to test, analyze, and verify DL models, since their quality directly affects the corresponding system behaviors. Recently, researchers have also proposed novel techniques for testing the underlying operator-level DL libraries (such as TensorFlow and PyTorch), which provide general binary implementations for each high-level DL operator and are the foundation for running DL models on different hardware platforms. However, there is still limited work targeting the reliability of the emerging tensor compilers (also known as DL compilers), which aim to automatically compile high-level tensor computation graphs directly into high-performance binaries for better efficiency, portability, and scalability than traditional operator-level libraries. Therefore, in this paper, we target the important problem of tensor compiler testing, and have proposed Tzer, a practical fuzzing technique for the widely used TVM tensor compiler. Tzer focuses on mutating the low-level Intermediate Representation (IR) for TVM due to the limited mutation space for the high-level IR. More specifically, Tzer leverages both general-purpose and tensor-compiler-specific mutators guided by coverage feedback for diverse and evolutionary IR mutation; furthermore, since tensor compilers provide various passes (i.e., transformations) for IR optimization, Tzer also performs pass mutation in tandem with IR mutation for more effective fuzzing. Our experimental results show that Tzer substantially outperforms existing fuzzing techniques on tensor compiler testing, with 75% higher coverage and 50% more valuable tests than the 2nd-best technique. Also, different components of Tzer have been validated via ablation study. To date, Tzer has detected 49 previously unknown bugs for TVM, with 37 bugs confirmed and 25 bugs fixed (PR merged).

Le temps des cerises: efficient temporal stack safety on capability machines using directed capabilities

Capability machines are a type of CPUs that support fine-grained privilege separation using capabilities, machine words that include forms of authority. Formal models of capability machines and associated calling conventions have so far focused on establishing two forms of stack safety properties, namely local state encapsulation and well-bracketed control flow. We introduce a novel kind of directed capabilities and show how to use them to make an earlier suggested calling convention more efficient. In contrast to earlier work on capability machine models we do not only consider integrity properties but also confidentiality properties; we provide a unary logical relation to reason about the former and a binary logical relation to reason about the latter, each expressive enough to reason about temporal stack safety. While the logical relations are useful for reasoning about concrete examples, they do not on their own demonstrate that stack safety holds for a large class of programs. Therefore, we also show full abstraction of a compiler from an overlay semantics that internalizes the calling convention as a single call step and explicitly keeps track of the call stack and frame lifetimes to a base capability machine. All results have been mechanized in Coq.

Proof transfer for fast certification of multiple approximate neural networks

Developers of machine learning applications often apply post-training neural network optimizations, such as quantization and pruning, that approximate a neural network to speed up inference and reduce energy consumption, while maintaining high accuracy and robustness.

Despite a recent surge in techniques for the robustness verification of neural networks, a major limitation of almost all state-of-the-art approaches is that the verification needs to be run from scratch every time the network is even slightly modified. Running precise end-to-end verification from scratch for every new network is expensive and impractical in many scenarios that use or compare multiple approximate network versions, and the robustness of all the networks needs to be verified efficiently.

We present FANC, the first general technique for transferring proofs between a given network and its multiple approximate versions without compromising verifier precision. To reuse the proofs obtained when verifying the original network, FANC generates a set of templates – connected symbolic shapes at intermediate layers of the original network – that capture the proof of the property to be verified. We present novel algorithms for generating and transforming templates that generalize to a broad range of approximate networks and reduce the verification cost.

We present a comprehensive evaluation demonstrating the effectiveness of our approach. We consider a diverse set of networks obtained by applying popular approximation techniques such as quantization and pruning on fully-connected and convolutional architectures and verify their robustness against different adversarial attacks such as adversarial patches, L0, rotation and brightening. Our results indicate that FANC can significantly speed up verification with state-of-the-art verifier, DeepZ by up to 4.1x.

Effects, capabilities, and boxes: from scope-based reasoning to type-based reasoning and back

Reasoning about the use of external resources is an important aspect of many practical applications. Effect systems enable tracking such information in types, but at the cost of complicating signatures of common functions. Capabilities coupled with escape analysis offer safety and natural signatures, but are often overly coarse grained and restrictive. We present System C, which builds on and generalizes ideas from type-based escape analysis and demonstrates that capabilities and effects can be reconciled harmoniously. By assuming that all functions are second class, we can admit natural signatures for many common programs. By introducing a notion of boxed values, we can lift the restrictions of second-class values at the cost of needing to track degree-of-impurity information in types. The system we present is expressive enough to support effect handlers in full capacity. We practically evaluate System C in an implementation and prove its soundness.

Elipmoc: advanced decompilation of Ethereum smart contracts

Smart contracts on the Ethereum blockchain greatly benefit from cutting-edge analysis techniques and pose significant challenges. A primary challenge is the extremely low-level representation of deployed contracts. We present Elipmoc, a decompiler for the next generation of smart contract analyses. Elipmoc is an evolution of Gigahorse, the top research decompiler, dramatically improving over it and over other state-of-the-art tools, by employing several high-precision techniques and making them scalable. Among these techniques are a new kind of context sensitivity (termed “transactional sensitivity”) that provides a more effective static abstraction of distinct dynamic executions; a path-sensitive (yet scalable, through path merging) algorithm for inference of function arguments and returns; and a fully context sensitive private function reconstruction process. As a result, smart contract security analyses and reverse-engineering tools built on top of Elipmoc achieve high scalability, precision and completeness.

Elipmoc improves over all notable past decompilers, including its predecessor, Gigahorse, and the state-of-the-art industrial tool, Panoramix, integrated into the primary Ethereum blockchain explorer, Etherscan. Elipmoc produces decompiled contracts with fully resolved operands at a rate of 99.5% (compared to 62.8% for Gigahorse), and achieves much higher completeness in code decompilation than Panoramix—e.g., up to 67% more coverage of external call statements—while being over 5x faster. Elipmoc has been the enabler for recent (independent) discoveries of several exploitable vulnerabilities on popular protocols, over funds in the many millions of dollars.

C to checked C by 3c

Owing to the continued use of C (and C++), spatial safety violations (e.g., buffer overflows) still constitute one of today's most dangerous and prevalent security vulnerabilities. To combat these violations, Checked C extends C with bounds-enforced checked pointer types. Checked C is essentially a gradually typed spatially safe C - checked pointers are backwards-binary compatible with legacy pointers, and the language allows them to be added piecemeal, rather than necessarily all at once, so that safety retrofitting can be incremental. This paper presents a semi-automated process for porting a legacy C program to Checked C. The process centers on 3C, a static analysis-based annotation tool. 3C employs two novel static analysis algorithms - typ3c and boun3c - to annotate legacy pointers as checked pointers, and to infer array bounds annotations for pointers that need them. 3C performs a root cause analysis to direct a human developer to code that should be refactored; once done, 3C can be re-run to infer further annotations (and updated root causes). Experiments on 11 programs totaling 319KLoC show 3C to be effective at inferring checked pointer types, and experience with previously and newly ported code finds 3C works well when combined with human-driven refactoring.

Applying cognitive principles to model-finding output: the positive value of negative information

Model-finders, such as SAT/SMT-solvers and Alloy, are used widely both directly and embedded in domain-specific tools. They support both conventional verification and, unlike other verification tools, property-free exploration. To do this effectively, they must produce output that helps users with these tasks. Unfortunately, the output of model-finders has seen relatively little rigorous human-factors study.

Conventionally, these tools tend to show one satisfying instance at a time. Drawing inspiration from the cognitive science literature, we investigate two aspects of model-finder output: how many instances to show at once, and whether all instances must actually satisfy the input constraints. Using both controlled studies and open-ended talk-alouds, we show that there is benefit to showing negative instances in certain settings; the impact of multiple instances is less clear. Our work is a first step in a theoretically grounded approach to understanding how users engage cognitively with model-finder output, and how those tools might better support users in doing so.

C4: verified transactional objects

Transactional objects combine the performance of classical concurrent objects with the high-level programmability of transactional memory. However, verifying the correctness of transactional objects is tricky, requiring reasoning simultaneously about classical concurrent objects, which guarantee the atomicity of individual methods—the property known as linearizability—and about software-transactional-memory libraries, which guarantee the atomicity of user-defined sequences of method calls—or serializability.

We present a formal-verification framework called C4, built up from the familiar notion of linearizability and its compositional properties, that allows proof of both kinds of libraries, along with composition of theorems from both styles to prove correctness of applications or further libraries. We apply the framework in a significant case study, verifying a transactional set object built out of both classical and transactional components following the technique of transactional predication; the proof is modular, reasoning separately about the transactional and nontransactional parts of the implementation. Central to our approach is the use of syntactic transformers on interaction trees—i.e., transactional libraries that transform client code to enforce particular synchronization disciplines. Our framework and case studieswe mentioned only one above...? are mechanized in Coq.

Finding real bugs in big programs with incorrectness logic

Incorrectness Logic (IL) has recently been advanced as a logical theory for compositionally proving the presence of bugs—dual to Hoare Logic, which is used to compositionally prove their absence. Though IL was motivated in large part by the aim of providing a logical foundation for bug-catching program analyses, it has remained an open question: is IL useful only retrospectively (to explain existing analyses), or can it actually be useful in developing new analyses which can catch real bugs in big programs?

In this work, we develop Pulse-X, a new, automatic program analysis for catching memory errors, based on ISL, a recent synthesis of IL and separation logic. Using Pulse-X, we have found 15 new real bugs in OpenSSL, which we have reported to OpenSSL maintainers and have since been fixed. In order not to be overwhelmed with potential but false error reports, we develop a compositional bug-reporting criterion based on a distinction between latent and manifest errors, which references the under-approximate ISL abstractions computed by Pulse-X, and we investigate the fix rate resulting from application of this criterion. Finally, to probe the potential practicality of our bug-finding method, we conduct a comparison to Infer, a widely used analyzer which has proven useful in industrial engineering practice.

Purity of an ST monad: full abstraction by semantically typed back-translation

In 1995, Launchbury and Peyton Jones extended Haskell with an ST monad that allows the programmer to use higher-order mutable state. They informally argued that these state computations were safely encapsulated, and as such, that the rich reasoning principles stemming from the purity of the language, were not threatened.

In this paper, we give a formal account of the preservation of purity after adding an ST monad to a simply-typed call-by-value recursive lambda calculus. We state and prove full abstraction when embedding the pure language into its extension with ST; contextual equivalences from the pure language continue to hold in the presence of ST.

Proving full abstraction of compilers is usually done by emulating or back-translating the target features (here: ST computations) into the source language, a well-known challenge in the secure compilation community. We employ a novel proof technique for proving our full abstraction result that allows us to use a semantically (but not syntactically) typed back-translation into an intermediate language. We believe that this technique provides additional insight into our proof and that it is of general interest to researchers studying programming languages and compilers using full abstraction.

The results presented here are fully formalized in the Coq proof assistant using the Iris framework.

Translating canonical SQL to imperative code in Coq

SQL is by far the most widely used and implemented query language. Yet, on some key features, such as correlated queries and NULL value semantics, many implementations diverge or contain bugs. We leverage recent advances in the formalization of SQL and query compilers to develop DBCert, the first mechanically verified compiler from SQL queries written in a canonical form to imperative code. Building DBCert required several new contributions which are described in this paper. First, we specify and mechanize a complete translation from SQL to the Nested Relational Algebra which can be used for query optimization. Second, we define Imp, a small imperative language sufficient to express SQL and which can target several execution languages including JavaScript. Finally, we develop a mechanized translation from the nested relational algebra to Imp, using the nested relational calculus as an intermediate step.

End-to-end translation validation for the halide language

This paper considers the correctness of domain-specific compilers for tensor programming languages through the study of Halide, a popular representative. It describes a translation validation algorithm for affine Halide specifications, independently of the scheduling language. The algorithm relies on “prophetic” annotations added by the compiler to the generated array assignments. The annotations provide a refinement mapping from assignments in the generated code to the tensor definitions from the specification. Our implementation leverages an affine solver and a general SMT solver, and scales to complete Halide benchmarks.

Language-parametric static semantic code completion

Code completion is an editor service in IDEs that proposes code fragments for the user to insert at the caret position in their code. Code completion should be sound and complete. It should be sound, such that it only proposes fragments that do not violate the syntactic and static semantic rules of the language. It should be complete, such that it proposes all valid fragments so that code completion can be used to construct all programs. To realize soundness and completeness, code completion should be informed by the language definition. In practice, the implementation of code completion is an additional effort in the implementation of a language.

In this paper, we develop a framework for language-parametric semantic code completion for statically typed programming languages based on their specification of syntax and static semantics, realizing the implementation of a code completion editor service with minimal additional effort. The framework builds on the SDF3 syntax definition formalism and the Statix static semantics specification language. The algorithm reinterprets the static semantics definition to find sound expansions of predicates and solutions to name resolution queries in scope graphs. This allows a search strategy to explore the solution space and synthesize completion proposals. The implementation of the strategy language and code completion algorithm extend the implementation of the Statix solver, and can be used for any language defined in Statix. We demonstrate soundness and completeness of the completion proposal synthesis, and evaluate its performance.

Bugs in Quantum computing platforms: an empirical study

The interest in quantum computing is growing, and with it, the importance of software platforms to develop quantum programs. Ensuring the correctness of such platforms is important, and it requires a thorough understanding of the bugs they typically suffer from. To address this need, this paper presents the first in-depth study of bugs in quantum computing platforms. We gather and inspect a set of 223 real-world bugs from 18 open-source quantum computing platforms. Our study shows that a significant fraction of these bugs (39.9%) are quantum-specific, calling for dedicated approaches to prevent and find them. The bugs are spread across various components, but quantum-specific bugs occur particularly often in components that represent, compile, and optimize quantum programming abstractions. Many quantum-specific bugs manifest through unexpected outputs, rather than more obvious signs of misbehavior, such as crashes. Finally, we present a hierarchy of recurrent bug patterns, including ten novel, quantum-specific patterns. Our findings not only show the importance and prevalence bugs in quantum computing platforms, but they help developers to avoid common mistakes and tool builders to tackle the challenge of preventing, finding, and fixing these bugs.

Quantitative strongest post: a calculus for reasoning about the flow of quantitative information

We present a novel strongest-postcondition-style calculus for quantitative reasoning about non-deterministic programs with loops. Whereas existing quantitative weakest pre allows reasoning about the value of a quantity after a program terminates on a given initial state, quantitative strongest post allows reasoning about the value that a quantity had before the program was executed and reached a given final state. We show how strongest post enables reasoning about the flow of quantitative information through programs. Similarly to weakest liberal preconditions, we also develop a quantitative strongest liberal post. As a byproduct, we obtain the entirely unexplored notion of strongest liberal postconditions and show how these foreshadow a potential new program logic - partial incorrectness logic - which would be a more liberal version of O'Hearn's recent incorrectness logic.

SHARP: fast incremental context-sensitive pointer analysis for Java

We present SHARP, an incremental context-sensitive pointer analysis algorithm that scales to real-world large complex Java programs and can also be efficiently parallelized. To our knowledge, SHARP is the first algorithm to tackle context-sensitivity in the state-of-the-art incremental pointer analysis (with regards to code modifications including both statement additions and deletions), which applies to both k-CFA and k-obj. To achieve it, SHARP tackles several technical challenges: soundness, redundant computations, and parallelism to improve scalability without losing precision. We conduct an extensive empirical evaluation of SHARP on large and popular Java projects and their code commits, showing impressive performance improvement: our incremental algorithm only requires on average 31 seconds to handle a real-world code commit for k-CFA and k-obj, which has comparable performance to the state-of-the-art incremental context-insensitive pointer analysis. Our parallelization further improves the performance and enables SHARP to finish within 18 seconds per code commit on average on an eight-core machine.

Functional collection programming with semi-ring dictionaries

This paper introduces semi-ring dictionaries, a powerful class of compositional and purely functional collections that subsume other collection types such as sets, multisets, arrays, vectors, and matrices. We developed SDQL, a statically typed language that can express relational algebra with aggregations, linear algebra, and functional collections over data such as relations and matrices using semi-ring dictionaries. Furthermore, thanks to the algebraic structure behind these dictionaries, SDQL unifies a wide range of optimizations commonly used in databases (DB) and linear algebra (LA). As a result, SDQL enables efficient processing of hybrid DB and LA workloads, by putting together optimizations that are otherwise confined to either DB systems or LA frameworks. We show experimentally that a handful of DB and LA workloads can take advantage of the SDQL language and optimizations. SDQL can be competitive with or outperforms a host of systems that are state of the art in their own domain: in-memory DB systems Typer and Tectorwise for (flat, not nested) relational data; SciPy for LA workloads; sparse tensor compiler taco; the Trance nested relational engine; and the in-database machine learning engines LMFAO and Morpheus for hybrid DB/LA workloads over relational data.