Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies
Authors: Bastian Hagedorn (University of Münster, Germany), Johannes Lenfers (University of Münster, Germany), Thomas Koehler (University of Glasgow), Xueying Qin (University of Glasgow, United Kingdom), Sergei Gorlatch (University of Muenster), Michel Steuwer (University of Edinburgh, United Kingdom)
Venue: ICFP 2020
High-performance array code, for applications such as machine learning or image processing, needs both good algorithms and highly tuned code. While the algorithms are quite general, the tuning–involving optimisations such as tiling, vectorisation, and loop unrolling–is very platform specific. This paper cleanly separates those concerns, providing domain-specific languages for specifying the algorithm and the optimisations independently, with an optimisation language that supports abstraction and reuse properly for the first time. As a result we can enjoy elegance, and state-of-the-art performance, both at the same time. Sometimes we can have our cake and eat it too.
Learning-based Memory Allocation for C++ Server Workloads
Authors: Martin Maas (Google), David Andersen (CMU), Michael Isard (Google), Mohammad Mahdi (Facebook), Kathryn S McKinley (Google), Colin Raffel (University of North Carolina)
Venue: ASPLOS 2020
As C++ server workloads move to using huge pages to avoid TLB pressure, memory fragmentation issues have been exacerbated: even a small number of objects left allocated can prevent the OS from reclaiming a virtual memory page. While garbage-collected languages can avoid this problem by moving objects around, for the most part, C and C++ programs cannot (though some recent work tackles some of the problem through virtual memory remapping). A classic insight for tackling this problem – objects of similar lifetime should get allocated to the same page – does not work effectively for server workloads: profiling based techniques do not have sufficient coverage, and machine-learning techniques are not accurate enough nor discriminating enough to reduce fragmentation in the huge pages that server workloads use. This paper shows that sophisticated machine learning techniques, based on LSTMs, can avoid the pitfalls of prior work by accurately predicting a range of placement policies for objects, even for never-before-seen behaviors. On real-world workloads, the paper’s approach reduces fragmentation by up to 78%.
Liquid information flow control
Authors: Nadia Polikarpova (University of California, San Diego), Deian Stefan (UC San Diego), Jean Yang (Carnegie Mellon University), Shachar Itzhaky (Technion), Travis Hance (CMU), Armando Solar-Lezama (Massachusetts Institute of Technology)
Venue: ICFP 2020
Reasoning about information flow in programs allows programmers to understand how sensitive data flows through their applications; even better are tools that allow programmers to specify declarative policies for information flow, allowing software systems to verify that sensitive information does not leak to untrusted parties. In the world of information flow control, prior work has either allowed programs to be declared safe at compile time or allowed programs to use sophisticated information flow policies, but not both. This paper shows how to get the best of both worlds by leveraging recent advances in type systems. Even better, this paper shows how programmers can fix programs that have information flow problems by leveraging program synthesis.
Proving Data-Poisoning Robustness in Decision Trees
Authors: Samuel Drews, Aws Albarghouthi, Loris D'Antoni (University of Wisconsin–Madison)
Venue: PLDI 2020
The topic of ensuring that machine learning models are safe and robust has gained substantial interest in recent years. Indeed, the ability to provide guarantees that the model works as expected is of fundamental importance to its confident usage. While most work has so far focused on proving properties relevant to test and inference time, there has been much less work on ensuring that other phases of the machine learning pipeline, such as training, work correctly, yet those phases are as important. This is the first paper which proposes a method that can ensure the training procedure of a machine learning model, here decision trees, is safe against poisoning attacks, a very real and natural property that can arise due to the difficulty of manually examining the (large) training set. The paper is very well written and is cleanly phrased in abstract interpretation terms; we expect it to have substantial impact and to trigger more work on guarantees provided in the training phase of other models, such as deep neural networks. Overall, an excellent paper of broad interest, which demonstrates for the first time that one can obtain deterministic certificates on properties of the training phase of realistic machine learning models.
Provably and Practically Efficient Granularity Control
Authors: Umut A. Acar (CMU and Inria), Vitaly Aksenov (Inria and ITMO University), Arthur Charguéraud (Inria & Université de Strasbourg, CNRS, ICube), Mike Rainey (Indiana University and Inria)
Venue: PPoPP 2019
Determining the granularity of parallel tasks is a long-standing challenge for parallelizing programs. Overly fine-grained parallelization incurs excessive overhead due to synchronization and control. Overly coarse-grained parallelism limits parallelism by impeding load balancing. Finding the grain size that hits the sweet spot usually requires non-trivial tuning effort. Almost all parallel programming models (Cilk, OpenMP, TBB, etc.) require the programmer to manually identify this grain size, which is usually program and machine dependent.
This paper presents a solution to this problem: a practical and provably efficient general automatic mechanism for finding the right granularity size for parallel programs. Their approach works by gradually coarsening the granularity until parity is achieved between the parallel and sequential versions. Its practical usability is evaluated against hand tuned granularity selection on the PBBS benchmark. The results often come close to hand-picked grain sizes and even outperform them in some cases. The authors provide a theorem that shows that their control algorithm achieves an execution time of within a small constant factor of the optimal solution (under some assumptions that are met by a wide variety of programs).
code2vec: Learning Distributed Representations of Code
Authors: Uri Alon (Technion, Israel), Meital Zilberstein (Technion, Israel), Omer Levy (Facebook AI), Eran Yahav (Technion, Israel)
Venue: POPL 2019
Given a code snippet, what is the likely name of the function it implements? Can we predict properties of this snippet, such as how similar it is to other code snippets? Can we answer these questions without executing any code, given that we have access to large code databases? This work presents a novel methodology for addressing these questions by learning from data a so-called distributed representation of code snippets, similar to the distributed representations of words or documents from the natural language processing domain. The key technical innovation is a model that applies soft attention to the representations of all paths in a code snippet; and can thus learn to identify the most relevant paths for the task in hand. This highly-cited paper reports significant advances over previous work in predicting method names compared to previous work, includes a qualitative evaluation of the interpretability of the learnt attention over the paths of a program, comes with an open-source implementation, as well as online interactive demos.
PlanAlyzer: assessing threats to the validity of online experiments
Authors: Emma Tosch (UMass), Eytan Bakshy(Facebook), Emery Berger (UMass), David D. Jensen (UMass), and J. Eliot B. Moss (UMass)
Venue: OOPSLA 2019
The paper presents PlanAlyzer, a first-of-its-kind static analyzer for the domain specific language (DSL) PlanOut. PlanOut is a leading DSL for expressing online experiments, which are a critical tool for designing and engineering online processes by internet companies. PlanAlyzer detects errors that can impact the validity of statistical conclusions. Previously, such experiments were only validated and analyzed by a human ad hoc effort. The paper lists various frequent potential pitfalls that would lead to an erroneous analysis of experiments, including failures of randomization, treatment assignment, or causal sufficiency. PlanAlyzer statically checks whether an input program involves one of these bad practices. Experimental results are provided for a few years’ worth of real PlanOut programs from Facebook. A subset of these programs are mutated and PlanAlyzer yields precision and recall of 92% on the mutated corpus.
Published as PlanAlyzer: assessing threats to the validity of online experiments in September 2021 CACM Research Highlight, with Technical Perspective: Does your experiment smell? by Stefano Balietti.
BioScript: Programming Safe Chemistry on Laboratories-on-a-chip
Authors: Jason Ott (UC Riverside), Tyson Loveless (UC Riverside), Chris Curtis (UC Riverside), Mohsen Lesani (UC Riverside), Philip Brisk (UC Riverside)
Venue: OOPSLA 2018
Laboratories on a chip (LoCs) can be seen as the ‘general-purpose’ CPU of microfluidic devices. This paper describes the design and implementation of “BioScript”, a programming language for LoCs that allows biochemists to express microfluid experiments as programs. This research exemplifies the use of techniques from programming language theory and practice in order to satisfy the constraints imposed by these systems while producing a language that is fit for its purpose. This includes designing a type systems that helps prevent scientists from accidentally causing unsafe chemical reactions that could damage the lab itself or produce toxic gas.
The authors demonstrate that as more platforms require programming — increasingly by non-computer scientists — the creative mix and application of well-studied theory can provide meaningful design guidance.
Published as BioScript: Programming Safe Chemistry on Laboratories-on-a-chip in January 2021 CACM Research Highlight, with Technical Perspective: Programming microfluidics to execute biological protocols by Nada Amin.
MadMax: surviving out-of-gas conditions in Ethereum smart contracts
Authors: Neville Grech (University of Athens, Greece / University of Malta, Malta), Michael Kong (University of Sydney, Australia), Anton Jurisevic (University of Sydney, Australia), Lexi Brent (University of Sydney, Australia), Bernhard Scholz (University of Sydney, Australia), Yannis Smaragdakis (University of Athens, Greece)
Venue: OOPSLA 2018
Ethereum is the world’s second largest cryptocurrency, with a current market capitalization over $20 billion. It was the first to support smart contracts—Turing complete programs stored on the blockchain, and executed by the Ethereum Virtual Machine, which perform transactions with no human interaction. Bugs in these contracts can enable theft, or render the managed funds permanently inaccessible.
This paper addresses a particular kind of bug, exploiting the fact that smart contracts consume “gas” as they run, and calls that run out of gas are aborted. If an attacker can corrupt the state of a contract so that all future calls run out of gas, then the funds it manages are permanently lost. MadMax can analyze any Ethereum contract by decompiling its byte code, then searching for code patterns that are likely to be associated with three common kinds of “out of gas” vulnerabilities, using the successful Datalog approach to implement a data-flow analysis, context sensitive flow analysis, and memory modeling for data structures. The result is an accurate and scalable analysis that the authors applied to every smart contract on the Ethereum blockchain—over 90,000 of them. In 10 hours of analysis, MadMax flagged over 5% of these contracts (managing about $2.8 billion) as potentially vulnerable to one of the three attacks. Inspection of the first 13 flagged contracts, with 16 flagged vulnerabilities, showed that 13 vulnerabilities were real—so only around 20% of flagged vulnerabilities are false positives.
Blockchain technology is currently attracting significant interest, and security requirements are at its core. This paper shows beautifully how state-of-the-art programming-language techniques can be brought to bear to improve the security of the blockchain infrastructure we may all depend on in the future.
Published as MadMax: surviving out-of-gas conditions in Ethereum smart contracts in September 2020 CACM Research Highlight, with Technical Perspective: Analyzing smart contracts with MadMax by Benjamin Livshits.
BLeak: automatically debugging memory leaks in web applications
Authors: John Vilk (University of Massachusetts at Amherst), Emery Berger (University of Massachusetts at Amherst)
Venue: PLDI 2018
This paper presents a manifestly extremely useful framework for detecting memory leaks in web applications. The programmer provides a script that specifies a “visible loop” in terms of states in a web application, and the BLeak system flags objects that consistently exhibit a growing set of outgoing references as potential leaks. Via appropriate hooks, any addition of such a growth reference triggers the collection of a stack trace for debugging. In addition, the tool attributes importance to leak roots by counting the sizes of reachable nodes (normalized by their count) in order to help programmers identify where they should be focusing their debugging effort first. The results demonstrate great effectiveness: nearly 60 actual and never-found-before bugs were identified and fixed in real-world applications, leading to significant reductions in memory.
Published as BLeak: automatically debugging memory leaks in web applications in November 2020 CACM Research Highlight, with Technical Perspective: BLeak: semantics-aware leak detection in the web by Harry Xu.
Watching for Software Inefficiencies with Witch
Authors: Shasha Wen (College of William and Mary), Xu Liu (College of William and Mary), John Byrne (Hewlett Packard Labs), Milind Chabbi (Baidu Research & Scalable Machines Research)
Venue: ASPLOS 2018
This paper makes a substantial improvement to the state of production performance profiling tools. The paper is widely accessible and will be of interest to application, tool, and system developers.
Understanding application performance while it runs in production remains an open problem with implications on what we can compute at any given time scale and how we compute it. Historically, developers had two types of performance profilers at their disposal. (1) Coarse-grained profiles interrupt programs and sample performance counters or software events infrequently (e.g., gprof). (2) Fine-grained profilers instrument programs (e.g., DeadSpy) or require an additional thread and software configuration (e.g., SHIM). These types of tools make different trade offs. Coarse-grained profilers are efficient, but they only detect only a limited class of performance trends, e.g., the dominant method or thread that holds a lock for a long time. Fine-grained profilers detect more nuanced performance issues, such as useless writes, poor cache behavior, or program critical paths, but most have high overhead because they require program instrumentation or require lots of state, making them impractical for finding problems in deployment, where software is exercised in hard to reproduce and unexpected ways.
This paper introduces a new extremely low-overhead reservoir sampling approach that requires no program instrumentation and limited state. It uses hardware debugging registers and perfomance monitoring units (PMU) to capture values of memory locations. They prove the efficacy of the sampling approach to capture events that happen frequently enough to cause performance problems. (Rare events are important for correctness tools, but less so for performance tools.) The paper demonstrates their approach is low overhead and yet reveals performance problems. It shows their tool reveals actionable performance bugs related to consecutive reads and writes to the same memory location (e.g., dead writes, silent stores, and redundant loads), but the approach is general and will apply to many other pre-configured performance pathologies. They show practical impact through changes they or developers make to a range of large complicated multithreaded applications. Application and system developers will want to use it.
Bringing the Web up to Speed with WebAssembly
Authors: Andreas Haas (Google, Germany), Andreas Rossberg (Google, Germany), Derek L. Schuff (Google, USA), Ben L. Titzer (Google, Germany), Michael Holman (Microsoft, USA), Dan Gohman (Mozilla, USA), Luke Wagner (Mozilla, USA), Alon Zakai (Mozilla, USA), JF Bastien (Apple, USA)
Venue: PLDI 2017
The effort is a collaboration between all major browser vendors, represents an impressive application of programming language theory to a real-world problem, and is expected to have a substantial impact in making the Web more reliable and secure.
Published as Bringing the web up to speed with WebAssembly in December 2018 CACM Research Highlight, with Technical Perspective: WebAssembly–a quiet revolution of the web by Anders Møller.
Types from Data: Making Structured Data First-Class Citizens in F#
Authors: Tomas Petricek (University of Cambridge, UK), Gustavo Guerra (Microsoft, UK), Don Syme (Microsoft Research, UK)
Venue: PLDI 2016
The paper presents a practical solution to a common challenge: web programming with unstructured data in such formats as JSON, XML, and CSV. The solution is principled and uses the known idea of type inference in a new way, namely for external data at run time. Specifically, the paper presents a type system for external data that helps shorten access code and avoid unsafe lookups during data traversal. The paper also shows how to infer a type from example data, which makes the type system practical useable. Type inference happens at run-time and maps example data to a type that subsequent computation can use. In more detail, type inference proceeds in two steps: first a shape inferencer maps data to shapes via use of row variables, and then a type provider maps shapes to types. A key technical aspect of the shape inferencer is a preference relation that reflects both the authors’ design principles and the host language F#. The paper proves the safety of code that uses the inferred types as long as the input data is no crazier than the example data.
Black-box Concurrent Data Structures for NUMA Architectures
Authors: Irina Calciu (VMware Research, Palo Alto, CA, USA), Siddhartha Sen (Microsoft Research, New York, NY, USA), Mahesh Balakrishnan (Yale University, New Haven, CT, USA), Marcos K. Aguilera (VMware Research, Palo Alto, CA, USA)
Venue: ASPLOS 2017
The evolution of multicore hardware has reached the point in which a single server divides its cores and memory into two or more Non Uniform Memory Access (NUMA) domains. Designing concurrent data structures is very challenging in general, and these NUMA architectures make it harder. This paper exploits the observation that NUMA multicore architectures increasingly resemble distributed systems. It introduces an approach for transforming sequential data structures into NUMA aware concurrent data structures. The approach combines data replication with per NUMA logging of updates, borrowed from distributed systems algorithms, and shared memory data consistency protocols. The approach is surprisingly effective for a range of data structures. This paper is not the final word, but it points in a new direction for designing concurrent algorithms for shared memory NUMA architectures.
Published as How to implement any concurrent data structure in December 2018 CACM Research Highlight, with Technical Perspective: Node replication divides to conquer by Tim Harris.
Predicting Program Properties from “Big Code”
Authors: Veselin Raychev (ETH Zürich), Martin Vechev (ETH Zürich), Andreas Krause (ETH Zürich)
Venue: POPL 2015
Published as Predicting program properties from big code in March 2019 CACM Research Highlight, with Technical Perspective: Borrowing big code to automate programming activities by Martin C. Rinard.
Authors: Steffen Smolka (Cornell University), Spiridon Eliopoulos (Inhabited Type), Nate Foster (Cornell University), Arjun Guha (University of Massachusetts Amherst)
Venue: ICFP 2015
The paper presents a compiler that translates high-level programs to OpenFlow code for software-defined networks. The compiler deals with important challenges: supporting network-wide behavior, abstracting concrete network topologies, and achieving high confidence in correctness. The language, called NetKAT, is based on Kozen’s Kleene Algebra with Tests; a few additional features enable network programming. The language is expressive, the compilation times are short, and the generated code scales to large networks. The translation proceeds in three nontrivial steps that use a variety of techniques including a variant of binary decision diagrams, NetKAT automata, and a novel idea of fabric construction.
Provably Correct Peephole Optimizations with Alive
Authors: Nuno P. Lopes (Microsoft Research), David Menendez (Rutgers University), Santosh Nagarakatte (Rutgers University), John Regehr (University of Utah)
Venue: PLDI 2015
This paper introduces Alive, a DSL in which peephole optimizations can be written and their correctness automatically verified. The work is evaluated in the context of LLVM with impressive results. The LLVM developers have embraced the work, which makes this work an excellent example of the practical use of formal techniques to improve reliability of software that is used on a day-to-day basis. Alive illustrates one of the key benefits of programming languages: by moving the level of discourse for some problem to just the right level, one gains tremendous benefits. In this case, the design of a little DSL for specifying peephole optimizations hits a sweet spot where one can both have practical, formal verification of the optimizations’ correctness and generate efficient C++ implementations of the transformations.
Published as Practical Verification of Peephole Optimizations with Alive in February 2018 CACM Research Highlight, with Technical Perspective: Building Bug-Free Compilers by Steve Zdancewic.
Parallelizing Dynamic Programming through Rank Convergence
Authors: Saeed Maleki (University of Illinois at Urbana-Champaign), Madanlal Musuvathi (Microsoft Research), Todd Mytkowicz (Microsoft Research)
Venue: PPoPP 2014
This paper describes a rather surprising parallelization of several dynamic programming algorithms, by running what would normally be considered to be several different dependent wavefronts in parallel. The authors show that as a result of interesting algebraic properties of these algorithms, it is actually possible to do a lot of useful work in a dependent stage by assuming an essentially random output from the previous stage. The amount of work that must be redone to account for the actual output is very limited. It is a great example of the creative application of relatively basic abstract algebra.
Published as Efficient Parallelization Using Rank Convergence in Dynamic Programming Algorithms in October 2016 CACM Research Highlight, with Technical Perspective: The Power of Parallelizing Computations by James Larus.
Doppio: Breaking the Browser Language Barrier
Authors: John Vilk (University of Massachusetts), Emery Berger (University of Massachusetts)
Venue: PLDI 2014
Uncertain<T>: a First-order Type for Uncertain Data
Authors: James Bornholt (Australian National University), Todd Mytkowicz (Microsoft Research), Kathryn S. McKinley (Microsoft Research)
Venue: ASPLOS 2014
Programmers increasingly have to reason about, and write code, that depends on values that are approximate. Such values result from sensor readings, the use of approximate hardware, or dependence on a statistical reasoning software component such as machine learning. This paper presents a practical and well-motivated solution to this problem based on creating a generic type, Uncertain<T>, that captures the underlying semantics of uncertain data. The authors illustrate their idea with concrete examples of its use, motivating it as increasingly important abstraction for developers to have and use.
Functional Geometry and the Traite’ de Lutherie
Author: Harry Mairson (Brandeis University)
Venue: ICFP 2013
This paper presents a programming language designed to explain how string instrument makers codified their knowledge of instrument making. It builds on François Denis’s history book Traite’ de lutherie, observing that there is an underlying programming language behind the descriptions of the process for making an instrument. Even better, the descriptions are often overly verbose because they are low-level and repetitious, enough so that the usual benefits of abstraction can help others learn these skills more effectively — building and using a few helper functions here and there can make this “code” much easier to read and learn from. The paper also demonstrates thru lots of code fragments how to describe several instruments, culminating in a violin by Andrea Amati.
Exploiting Vector Instructions with Generalized Stream Fusion
Authors: Geoffrey Mainland (Microsoft Research), Roman Leshchinskiy, Simon Peyton Jones (Microsoft Research)
Venue: ICFP 2013
This paper demonstrates how a generalized notion of streams and accompanying stream fusion techniques can produce code significantly faster than hand-tuned C code for certain kinds of high-level functional code. For example, the Haskell implementation of dot product (on vectors of floats) can be turned into code that uses the SSE instructions in an unrolled loop for performance on par with ddotp from GotoBLAS (highly tuned assembly language routines that are widely recognized to be the fastest available and generally faster than what GCC can produce for hand-tuned C code).
Published as Exploiting Vector Instructions with Generalized Stream Fusion in May 2017 CACM Research Highlight, with Technical Perspective: Functional Compilers by Guy Blelloch.
Fun with Semirings: A Functional Pearl on the Abuse of Linear Algebra
Author: Stephen Dolan (University of Cambridge)
Venue: ICFP 2013
The paper demonstrates common algebraic structure shared by, among other things, sets, graphs, regular expressions, imperative programs, and datatypes. It then shows how this structure can be exploited in a generic way by a library to provide implementations of dominators, a function that turns a state machine into a corresponding regular expression, solutions to dataflow equations, and the knapsack problem. The key insight is that all of these problems are instances of linear algebra problems – not over fields like the complex or real numbers, but instead over a “closed semi-ring”, a structure with a bit less structure than a field. Importantly, a closed semi-ring doesn’t have a notion of negation or reciprocals, which is what makes it work for all those structures mentioned earlier, as they don’t have a natural notion of negation of reciprocal. Although most of the individual results in this paper were previously known in specific communities, this paper does an exceptional job of pulling all the threads together and giving an overview of the generality of the abstraction.
Verifying Quantitative Reliability of Programs That Execute on Unreliable Hardware
Authors: Michael Carbin (MIT), Sasa Misailovic (MIT), Martin Rinard (MIT)
Venue: OOPSLA 2013
The paper provides an important step forward in reasoning about programs that execute on unreliable hardware. It provides a strong methodology for ensuring (probabilistic) correctness guarantees in the face of such unreliability, which can enable such hardware to be used effectively and robustly.
Empirical Analysis of Programming Language Adoption
Authors: Leo Meyerovich (University of California at Berkeley), Ariel Rabkin (Princeton University)
Venue: OOPSLA 2013
Based on a large survey and large repository analysis, this paper draws very interesting conclusions about what makes a programming language be adopted in practice (or not). Many design and implementation details that researchers and language designers traditionally find important (i.e. intrinsic features) seem not to be that important in practice. Instead, open source libraries, existing code, and experience strongly influence developers when selecting a language for a project. When considering intrinsic aspects of languages, developers prioritize expressivity over correctness. They perceive static types as primarily helping with the latter. Another finding is that developers quickly pick up new languages when needed. The reported data is interesting and gives plenty of food for thought for language designers. The study should also be interesting for anyone managing large software projects and having to make language choices.
Two of the papers selected in this round are “functional pearls”, which are elegant, instructive examples of functional programming. They can explain an old idea, but do so in a new way that clarifies the idea and yields new insights. Jon Bentley’s Programming Pearls column in CACM inspired Richard Bird’s Functional Pearls column in the Journal of Functional Programming. Pearls were later added as submission categories for ICFP and POPL.
A General Constraint-Centric Scheduling Framework for Spatial Architectures
Authors: Tony Nowatzki, Michael Sartin-Tarm, Lorenzo De Carli, Karthikeyan Sankaralingam, Cristian Estan, Behnam Robatmili
Venue: PLDI 2013
Scheduling of computation to hardware resources is absolutely critical to the success of spatial architectures, but it’s a very tough problem. The authors present a general framework for scheduling on spatial architectures, and evaluate that framework on three diverse architectures with impressive results w.r.t. specialized schedulers. This work is an impressive study of the scheduling problem presented by these architectures and will be of interest to PL and architecture people alike.
CLAP: Recording Local Executions to Reproduce Concurrency Failures
Authors: Jeff Huang, Charles Zhang, Julian Dolby
Venue: PLDI 2013
CLAP is an efficient technique for reproducing concurrency bugs that logs thread local execution and computes memory dependencies offline, allowing an observed concurrency bug to be reproduced. The principal contribution is the identification of a new approach to reproducing concurrency bugs that reduces overheads and reduces the observer effect seen in existing systems that inject synchronization points.
From Relational Verification to SIMD Loop Synthesis
Authors: Gilles Barthe, Juan Manuel Crespo, Sumit Gulwani, Cesar Kunz, Mark Marron
Venue: PPoPP 2013
This paper addresses a compiler optimization problem, namely generating SSE vector code for loops with data-driven control flow and suboptimal data layout, with an interesting and unexpected combination of techniques. The initial loop restructuring step only generates pre- and post-conditions for various code fragments. A synthesis algorithm is then used to fill these fragments in. This paper won the PPoPP best paper award and was recommended by Saman Amarasinghe, one of the PC Chairs.
SIMD Parallelization of Applications that Traverse Irregular Data Structures
Authors: Bin Ren, Gagan Agrawal, Jim Larus, Todd Mytkowicz, Tomi Poutanen, Wolfram Schulte
Venue: CGO 2013
Finding ways to parallelize important computations is now essential in creating a competitive product or service. This paper addresses the challenging problem of mapping a SIMD model of computation onto an irregular data structure and demonstrates the effectiveness of this approach for two important classes of problems: regular expression matching and evaluating forests of decision trees. The paper shows that with the proper intermediate language abstraction, such irregular computations can be mapped efficiently onto SIMD units and result in speedups of over 10 times for random forest evaluation mapping the computation onto the SSE extensions of the x86 instruction set on a single core machine. These improvements result both from parallelizing the irregular computation and carefully laying out the data being manipulated to reduce the memory latency. By showing that the approach applies to two large interesting classes of irregular computations, the authors demonstrate the generality of their approach its potential for broad impact.
Set-Theoretic Foundation of Parametric Polymorphism and Subtyping
Authors: Giuseppe Castagna, Zhiwu Xu
Venue: ICFP 2011
The paper provides a semantic basis for the combination of subtyping and parametric polymorphism (generics in Java parlance) that has eluded researchers for quite some time. The paper builds on the more “intuitive” types-as-sets notion, in a principled way, and explains the issues with past attempts and give intuition for the presented models. It has immediate application in typing XML, but appears to be much broader than that. The problem of semantic definition of types is generally an important and challenging one where we expect progress to come slowly. This paper seems to represent an important next step.
Authors: David Terei, Simon Marlow, Simon Peyton Jones, David Mazières
Venue: Haskell Symposium 2012
This paper describes a language extension of Haskell that is intended to make it possible to confine and safely execute untrusted, possibly malicious code. It leverages Haskell’s type system, and imposes relatively few requirements on the development of Haskell programs. There’s also a “Safe Haskell” subset of the language being identified that has particularly nice properties, and the compiler is able to automatically infer whether programs are in this subset. The design and approach of Safe Haskell should be of interest to the wider CACM community, even though (or in particular, because) some of Safe Haskell’s ideas are made possible because Haskell is already a relatively safe language to start with.
Work-Stealing Without the Baggage
Authors: Vivek Kumar, Daniel Frampton, Stephen M. Blackburn, David Grove, Olivier Tardieu
Venue: OOPSLA 2012
This paper takes a well-known idea, work-stealing to exploit parallel hardware, and asks why it doesn’t work as well as it should. It identifies the key sources of overhead, then conducts a substantial empirical study of 4 alternative strategies for avoiding that overhead. The result is that one of these techniques is shown to incur low-overhead (15%), while enabling just as much parallelism as the others - more bang for the buck. This is a reconsideration of a known idea that improves on it substantially.
AutoMan: A Platform for Integrating Human-Based and Digital Computation
Authors: Daniel W. Barowy, Charlie Curtsinger, Emery D. Berger, Andrew McGregor
Venue: OOPSLA 2012
There has been a lot of buzz about mixtures of machine and human computation. For the most part there has been little programmatic support for constructing such systems. This paper makes an initial step in that direction. It defines an API that permits crowd-source computation to be invoked from a program like ordinary function calls. This is coupled with a runtime system that manages the execution of the crowd-sources computation, determines when sufficient confidence in the results is achieved, manages things like repricing and restarting crowd-source tasks, and seeks to maximize parallelism in the execution of the crowd-source tasks. This is an interesting new idea that provides a direction for people to build on.
Published as AutoMan: A Platform for Integrating Human-Based and Digital Computation in June 2016 CACM Research Highlight, with Technical Perspective: Computing with the Crowd by Siddharth Suri.
On the Linear Ranking Problem for Integer Linear-Constraint Loops
Authors: Amir Ben-Amram, Samir Genaim
Venue: POPL 2013
The paper studies the complexity of the linear ranking problem, namely the problem that, given a loop described as a set of linear constraints over a finite set of integer variables, determines if there exists a linear ranking decreasing function for that loop. The existence of this function implies termination, although the converse may not hold. The great contribution is set by moving from rational/real valued variables, which is PTIME, to integer valued variables where the decidability of the problem was still open. The authors proved that the Linear Ranking Problem for Integer variables is decidable and it is coNP-complete. Algorithmic methods for synthesizing linear ranking functions have been introduced and restrictions have been studied in order to make the problem PTIME.
And Then There Were None: A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware
Authors: David F. Bacon, Perry Cheng, Sunil Shukla
Venue: PLDI 2012
This is a very forward-looking paper. The authors see a future where FPGAs are an important substrate for general purpose computing, and they address the problem of garbage collection for programs realized in FPGA. The paper presents the first completely hardware implementation of a garbage collector (as opposed to hardware-assist) and the first garbage collector which completely eliminates interference to the mutator. The paper is broad in scope (hardware, software, empirical, semi-formal), and is developed from a fundamental, first-principles background. The paper is accessible yet not incremental.
Published as And Then There Were None: A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware in December 2013 CACM Research Highlight, with Technical Perspective: The Cleanest Garbage Collection by Eliot Moss.
SuperC: Parsing All of C by Taming the Preprocessor
Authors: Paul Gazzillo, Robert Grimm
Venue: PLDI 2012
This paper addresses the long standing problem of correctly parsing both of the two underlying languages that comprise what we know as C: C proper, and the C preprocessor. Most tools do not attempt to parse both. Most C compilers parse a single configuration of the program rather than the full set of possible configurations. The authors provide a nice overview of the challenges of parsing C and explore this concretely with the challenging example of the Linux kernel. They introduce ‘Fork-Merge LR (FMLR)’ parsing which they use to address the challenge of parsing across multiple configuration possibilities. The outcome is a publicly available tool, ‘SuperC’. This paper is accessible to a broad audience, it has concrete applications, and has some very nice computer science behind it. The work has clear applications, addresses a very well established problem, and is bound to be a milestone publication.
Efficient Lookup-Table Protocol in Secure Multiparty Computation
Authors: John Launchbury, Iavor S. Diatchki, Thomas DuBuisson, Andy Adams-Moran
Venue: ICFP 2012
Cloud computing provides new opportunities for individuals to cheaply take advantage of tremendous computing power for occasional expensive computations, but the capability comes with new challenges. This paper specifically focuses on privacy. In particular, you might imagine encrypting your private before sending it to the cloud computers, in order to ensure secrets are not leaked. But how can a cloud computer compute on encrypted data? The SMC (secure multi-party computation) community has studied this challenge and has come up with algorithms that can successfully operate on encrypted data, without revealing that data to the cloud computers that are actually doing the computation. Unfortunately, their algorithms are typically expressed via circuits, making an efficient implementation more challenging than it should be. This paper shows how a languages-based perspective can bring powerful tools to bear on the problem. Indeed, the paper shows how to get order-of-magnitude improvements to the implementation of AES encryption in an SMC setting. Even better: the overall structure of the AES implementation is similar to the non-SMC version. One key technical development that makes this possible is a new array lookup primitive that provides good leverage for expressing the algorithm and its efficient implementation.
Experience Report: A Do-It-Yourself High-Assurance Compiler
Authors: Lee Pike, Nis Wegmann, Sebastian Niller, Alwyn Goodloe
Venue: ICFP 2012
This exemplary paper explains how a variety of techniques that are understood in the FP community can be applied to build domain-specific embedded languages with evidence that the compiler is correct. The techniques used include using a Turing-complete host language with a Turing-incomplete hosted language, static and dynamic type checking, the QuickCheck property-based verification tool, model checking, and aiming for a verifying rather than a verified compiler. These techniques were applied to build a language, Copilot, for verification monitors used in aviation, an application of great interest in its own right.
Fast Restore of Checkpointed Memory using Working Set Estimation
Authors: Irene Zhang, Alex Garthwaite, Yuri Baskakov, Kenneth C. Barr
Venue: VEE 2011
The paper describes a clever technique for restoring checkpointed OS-level virtual machines, which significantly outperforms both eager (restore all memory pages, then start VM) and lazy (page in as needed) techniques. While the checkpoint is computed, page accesses are tracked for roughly the time required to restore the VM. Accessed pages are restored eagerly, the rest lazily. If everything were entirely deterministic, this should restore exactly the needed pages eagerly, and it seems to come close. This is a VMWare paper, and I believe the technique is used in practice. It is in many ways the ideal CACM paper: It exposes the reader to a probably new problem space, the paper is easy to read, and probably unusually easy to shorten to the CACM page limit. It is a simple but interesting and meaningful result. It also exposes the reader to the “minimum mutator utilization” metrics from the GC literature, which may themselves be interesting to some readers.
SugarJ: Library-based Syntactic Language Extensibility
Authors: Sebastian Erdweg, Tillmann Rendel, Christian Kastner, Klaus Ostermann
Venue: OOPSLA 2011
This paper appeared in OOPSLA 2011, pages 391-406. It addresses the problem of adding domain-specific language extensions, including domain-specific syntax extensions, to an existing programming language through the use of library modules. The main claim is that the approach “maintains the composability and scoping properties of ordinary libraries” (abstract). The mechanism is also self-applicable, which allows one to write syntax extensions that help the process of writing syntax extensions, and it also composes at this meta-level.
The approach builds on Java, SDF, and Stratego. There is also a novel incremental parsing technique that supports changing syntax within a source file. Composition of syntax extensions works through the module import mechanism of the host language. There is, however, a caveat that grammars do not always compose, so there is a possibility that importing two syntax extensions may cause an error. In summary, the paper both combines a number of existing techniques (from SDF and Stratego) in a very elegant and well-thought out language design.
The paper has a nice demonstration of the ideas, with examples that will be well received by a CACM audience. One example shows how to write a layered XML extension, including an embedding of XML Schema. The XML Schema extension shows how context-sensitive checks (like type checking) can be accomplished in their framework.
Support for domain-specific languages is something that has been gaining steam in the PL community for years, and it seems like this paper can help bring these ideas to a wider audience. In particular, an idea from earlier work (on MetaBorg) is adapted that allows the writer of a syntax extension to use the concrete syntax of a syntax extension in the description of how to transform the extension into the underlying language (such as Java). This is itself a nice demonstration of the self-applicability of the syntax extension mechanisms and how such syntax extension mechanisms can be implemented as a library (see section 5.1). Another nice feature of the design is the well thought out way that extensions are staged during processing of a compilation unit (file). This process, described in section 4.2, doesn’t seem too deep, and may seem obvious in hindsight, but it’s the kind of thing that can have a long lasting impact on the field, as it just seems right. Brought together, all these mechanisms seem to make a fundamental difference, and seem to represent a leap forward in syntactic extensibility. In particular, the way that the mechanism is demonstrated to be self-applicable is extremely impressive. Furthermore, this self-applicability allows for an virtuous cycle of improvements in program libraries, as is shown to some extent in the paper with the implementation of support for concrete syntax transformations.
Understanding POWER Multiprocessors
Authors: Susmit Sarkar (University of Cambridge), Peter Sewell (University of Cambridge), Jade Alglave (Oxford University), Luc Maranget (INRIA), Derek Williams (IBM)
Venue: PLDI 2011
An overlapping set of authors wrote an earlier paper entitled “x86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors”, published in July 2010 as a Research Highlight. That paper describes the easy-to-understand, but formally describable, and empirically tested memory model for x86. This paper describes the analogous effort for IBMs Power architecture.
The paper gives a high-level description of some of the tricky behavior exhibited by Power and how to understand it. The paper builds an executable abstract machine and compares runs of the abstract machine with runs on Power hardware for a large suite of “litmus tests”. Execution of the abstract machine is nontrivial and involves a search for all possible executions. The paper explains example programs and tricky behavior via simple diagrams, and it explains the abstract machine with words, while the formal definitions are all in a technical report. The paper is an impressive tour-de-force of what programming language technology can do to help understand a complicated processor. For example, the authors defined their own language for specifying abstract machines, along with a translator to executable OCaml code and other tools. Additionally, the authors did a good job of making the abstract machine as simple as possible, in the face of a massively complicated processor.
This provides an interesting contrast to the x86 paper. The x86 paper gives rise to a model (essentially TSO) simple enough to be described operationally in a way that is probably useful to a typical assembly programmer. The Power model is inherently much more complex, in that the architecture exposes more speculation, and provides a very diverse set of memory fence instructions, with very subtle semantics. The present paper derives a model that is also mathematically precise, but probably too complex for a typical programmer. Nonetheless, it fills an important void, and finally answers important questions that have been creating confusion for years. A major question throughout the development of the latest C and C++ standards was whether the provided atomic operations were implementable on PowerPC, and at what cost. This allows those questions to be precisely answered, and finally allows a compiler writer to confidently and correctly hide the architectural complexity. If the C++ standards committee had had access to such a description, we would have avoided months of debate.
Authors: Peter Hawkins (Stanford University), Alex Aiken (Stanford University), Kathleen Fisher (Tufts University), Martin Rinard (MIT), Mooly Sagiv (Tel-Aviv University)
Venue: PLDI 2011
This paper describes a system that synthesizes C++ data structures and query/update algorithms from a declarative description. The motivation is that a programmer will be more likely to reason effectively about higher-level, declarative specifications and automatically-generated code. The approach is partially automated, but requires user input to select the data representation–although the paper experiments with auto-tuning to select the representation.
Published as An Introduction to Data Representation Synthesis in December 2012 CACM Research Highlight, with Technical Perspective: High-Level Data Structures by Yannis Smaragdakis.
Automated Atomicity-Violation Fixing
Authors: Guoliang Jin (University of Wisconsin-Madison), Linhai Song (University of Wisconsin-Madison), Wei Zhang (University of Wisconsin-Madison), Shan Lu (University of Wisconsin-Madison), Ben Liblit (University of Wisconsin-Madison)
Venue: PLDI 2011
This paper presents AFix, a development tool that uses the results of an atomicity violation detector (in this case, CTrigger) to automatically construct patches that fix the detected violations. It does so by placing critical sections around accesses that might cause atomicity violations (using various simple static analyses), and then merging sections together to reduce overhead and the likelihood of deadlock. On several real-world bugs, AFix is able to either completely fix or partially mitigate the effect of atomicity bugs, often better than manual patches. This paper was well motivated (the example in the introduction was very nicely chosen to illustrate the pertinent issues), well written, and explained a set of intuitive techniques clearly. In addition, the results seemed very promising. The strategy presented by the paper is fairly simple, but surprisingly effective, at least for fixing bugs that fall under its failure model. While it’s easy to see many sources of potential inefficiency and incompleteness in the proposed algorithm, the empirical results make the case that the technique can fix real bugs with negligible overhead. This is one of the first papers to attack the problem of automated bug fixing, so it should be of wide interest.
Automating String Processing in Spreadsheets Using Input-Output Examples (POPL 2011) and Spreadsheet Table Transformations from Examples (PLDI 2011)
Authors: POPL 2011: Sumit Gulwani; PLDI 2011: William Harris, Sumit Gulwani
Venue: POPL 2011; PLDI 2011
The committee nominates this paper (POPL 2011) together with the follow-up paper in PLDI 2011 by William Harris and Sumit Gulwani. Each of the papers is a great paper, and we believe they would be even better together as a combined CACM article. The theme of the papers is program synthesis in service of the general population. Specifically, the papers develop synthesis techniques that make it easier to use spreadsheets. The POPL 2011 paper focuses on string manipulation, while the PLDI 2011 paper focuses on table manipulation. Both papers enable “programming by example” in the sense that a user can show, with the existing functionality of the spreadsheet, an example of what needs to be done, and the implemented tool will then generate code for doing that kind of thing and apply it to all other instances in the table. If the result isn’t what the user wanted, then one can simply iterate the process. Both papers develop new program synthesis techniques and demonstrate via implementations and experiments that they work well in practice. These papers have broad appeal and show nicely that tools that are based on programming language techniques can be widely useful.
Published as Spreadsheet Data Manipulation Using Examples in August 2012 CACM Research Highlight, with Technical Perspective: Example-Driven Program Synthesis for End-User Programming by Martin C. Rinard.
Lime: a Java-Compatible and Synthesizable Language for Heterogeneous Architectures
Authors: Joshua Auerbach (IBM Research), David F. Bacon (IBM Research), Perry Cheng (IBM Research), Rodric Rabbah (IBM Research)
Venue: OOPSLA 2010
This paper describes a new language, Lime, that is designed to be compiled to and executed on a “range of architectures, from FPGAs to conventional CPUs.” The language also features parallelism at many different granularities. The key to the various optimizations given is the distinction between stateful and stateless computations. The paper has a novel notion of purity which is declared and checked in a novel way that is modular: by splitting the notion of purity into the concepts of valueness and locality. Various types can be declared as value types, which must have the property of being “deeply immutable.” Locality means “that no side effects can be propagated through static fields or information reachable from them.” Both “local static” methods whose arguments are all value types and “local instance” methods whose arguments are all of value types are pure functions. The language also has a kind of polymorphism that allows the locality of a method to depend on the context in which it invoked (called “glocal”).
Lime supports fine-grained parallelism (on value types) through collective operations, which can be user-defined. Types such as bitstrings and small arrays can be made data-parallel.
Lime also supports stream computing using tuple types and its “task” and “connect” mechanisms. The connect operator (written “=>”) connects an output stream in a task to another task’s input stream. Such connections are atomically buffered to help with flow control. (There is also a technique that can be used for rate matching.) To ease the connection between streams produced by tasks, Lime has a “match” operator and also statically checks connections to prevent “errors and scheduling complications”. However, it does have ways to send messages upstream or downstream to aid in configuration. All these mechanisms merely describe a dataflow graph; starting the computation is done separately, allowing the compiler to output the dataflow computation to hardware and providing the opportunity for various optimizations.
Tasks can also be used for coarse-grained parallelism with its “split” and “join” primitives, that operate on streams.
Lime’s support for compilation to various kinds of hardware is also aided by a mechanism that can declare when a set of types is complete. This is done by allowing classes and interfaces to name all their direct subtypes. The evaluation is impressive, with C and FPGA executions performing orders of magnitude faster compared to the baseline code, although the authors say that their FPGA backend is not completely mature.
Nikola: Embedding Compiled GPU Functions in Haskell
Authors: Geoffrey Mainland (Harvard University), Greg Morrisett (Harvard University)
Venue: Haskell Symposium 2010
This paper provides an excellent overview of the state of the art of domain-specific language (DSL) programming within a functional language such as Haskell. As running example, it introduces Nikola, a domain-specific language embedded in Haskell. Nikola is suitable for expressing first-order array computations. Nikola programs can thus be written within Haskell, having all the high-level features and the type system at the disposal. Despite being embedded, the Haskell compiler has access to a representation of the Nikola program, can analyze its sharing structure, can invoke a Nikola-specific compiler behind the scenes, generate code for the CUDA framework and have the program run on the GPU. The paper demonstrates that advantages of standalone DSLs and embedded DSLs can be combined without imposing any extra burden on the programmer. The authors show preliminary performance tests that indicate that it is possible to achieve the performance of hand-written CUDA code when starting from a Haskell program. The paper is interesting to people interested in DSLs because it provides a good overview of typical problems arising when embedding DSLs such as loss of sharing, and it gives pragmatic explanations how these problems can be avoided. It furthermore demonstrates that even embedded DSLs can generate efficient code, and therefore allow users of otherwise rather low-level languages such as CUDA to work in higher-level languages instead.
The Impact of Higher-Order State and Control Effects on Local Relational Reasoning
Authors: Derek Dreyer (MPI-SWS); Georg Neis (MPI-SWS); Lars Birkedal (IT University of Copenhagen)
Venue: ICFP 2010
Understanding if two programs are equivalent is an essential part of reasoning about the correctness of program transformations. This paper applies the modern technique of step-indexed Kripke logical relations to the old problem of reasoning about program equivalence in the presence of expressive language constructs, such as recursive types, abstract types, general references and call/cc. It also discusses how reasoning ability changes under restrictions to the language, giving examples properties that are true only if the language lacks higher-order state or control effects such as call/cc. Although the mathematical machinery necessary for this work is highly technical at times, much of the presentation in the paper is given in an informal, pedagogic style based on clean “visual” proof sketches.
Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs
Authors: Tiark Rompf (École Polytechnique Fédérale de Lausanne), Martin Odersky (École Polytechnique Fédérale de Lausanne)
Venue: GPCE 2010
The paper introduces “Lightweight Modular Staging” (LMS), a technique for expressing multi-stage programming. Staged programming allows control over the time when a piece of code is executed. In particular, code can be executed at compile time already, thereby allowing a disciplined form of program generation. Most programming languages that support staging require explicit source code annotations that delay or force the execution of a marked piece of code. LMS is different in that it does not have annotations on the term level: instead, simply changing the type signatures of a program fragment is sufficient to lift that program to a staged version. Additional optimizations can then be implemented separately (using Scala’s “trait”s), without touching the original code. LMS is particularly attractive for implementing embedded domain-specific languages (DSLs), because the staging features can be used to perform code optimizations in the background and the lightweight syntax keeps the surface syntax clean. This is confirmed by the fact that the Stanford Pervasive Parallelism Lab is already using LMS successfully to build domain-specific languages. The paper is a good example how modern language technology can help to solve long-standing and difficult problems such as staging: with Scala’s general-purpose features, the staging system as described in this paper can be implemented as a library, without need for further language extensions.
Published as Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs in June 2012 CACM Research Highlight, with Technical Perspective: The Fox and the Hedgehog by Peter Lee.
Authors: Viktor Kuncak, Mikaël Mayer, Ruzica Piskac, Philippe Suter
Venue: PLDI 2010
Synthesizing program fragments from high-level specifications has the potential to make programs more readable and programmers more productive. The problem has many challenging elements, however, including how to keep specifications concise, how to convert these specifications into code, and how to integrate such synthesis into general-purpose programming languages. This paper provides a key insight: decision procedures that generate witnesses can form the basis for synthesis. As a case study, the paper develops this idea for two useful decision procedures: linear arithmetic and constraints over sets. Further, the authors show how to incorporate such synthesis into Scala as a form of pattern matching. The resulting programs are concise and easier to understand than the versions without synthesis. The decision procedures are run at compile time to synthesize the desired code, and if no solution exists, the tool reports the failure statically. In summary, this paper describes a cool idea that could substantially change programming practice.
Published as Software Synthesis Procedures in February 2012 CACM Research Highlight, with Technical Perspective: Compiling What to How by Rastislav Bodik.
Safe to the Last Instruction: Automated Verification of a Type-Safe Operating System
Authors: Jean Yang, Chris Hawblitzel
Venue: PLDI 2010
This paper describes Verve, an operating system running on standard hardware that is verified to ensure type and memory safety. The verification is automatic, using a combination of two different layers. Due to this combination, a large part of the operating system kernel can actually be written in a high-level programming language (here: C#). The automatic nature of the verification brings verification times down to a number of minutes. This means that for the first time it is possible to develop extension of the system on the fly, and to perform quick experiments with new designs. The paper presents an entertaining and accessible introduction to the various issues that are involved in building such a system. Verve illustrates that building much more robust and secure operating systems is feasible. Given the widespread vulnerability of existing operating systems, Verve is of broad interest to the CACM community because it shows that alternatives are possible.
Published as Safe to the Last Instruction: Automated Verification of a Type-Safe Operating System in December 2011 CACM Research Highlight, with Technical Perspective: Safety First! by Xavier Leroy.
FastTrack: Efficient and Precise Dynamic Race Detection
Authors: Cormac Flanagan, Stephen N. Freund
Venue: PLDI 2009
Past work on dynamic data race detection had to choose between fast but imprecise race detectors that report false alarms vs. slow but precise race detectors that never report false alarms. The latter category is typically based on classical “vector clock” operations, where a single operation requires time linear in the number of program threads. The key insight behind the FastTrack dynamic data race detection algorithm introduced in this paper is that the full generality of vector clocks is not necessary in over 99% of these read and write operations in practice. Instead, a more lightweight representation of the happens-before information can be constructed using “epochs”. Experimental results on Java benchmarks show that this approach is an order of magnitude faster than a traditional vector-clock race detector, thereby representing a significant advance over classical work in this area. This paper includes a comprehensive comparison with past work, including the previously nominated “Goldilocks” paper. Though both papers address the same high-level problem, they take very different approaches. FastTrack has shown how past algorithms based on vector clocks can be made significantly more efficient and scalable, thereby providing a perspective on a large body of past work in this area. Another SIGPLAN-nominated Research Highlights paper, “Goldilocks”, is also highly innovative and takes a different approach that is not based on vector clocks.
Published as FastTrack: Efficient and Precise Dynamic Race Detection in November 2010 CACM Research Highlight, with Technical Perspective: Data Races are Evil with No Exceptions by Sarita Adve.
The Theory of Deadlock Avoidance via Discrete Control
Authors: Yin Wang, Stephanie Lafortune, Terence Kelly, Manjunath Kudlur, Scott Mahlke
Venue: POPL 2009
Deadlock is a difficult problem for the average programmer who must now confront multicore hardware. The Gadara project offers a very interesting possible solution. The attached POPL paper provides a theoretical foundation for dynamic deadlock avoidance in concurrent programs that employ conventional mutual exclusion and synchronization primitives. Beginning with control flow graphs extracted from program source code, it constructs a formal model of the program and then applies Discrete Control Theory to automatically synthesize deadlock avoidance control logic that is implemented by program instrumentation. At run time, the control logic avoids deadlocks by postponing lock acquisitions. Discrete Control Theory guarantees that the instrumented program cannot deadlock. The control logic is maximally permissive: it postpones lock acquisitions only when necessary to prevent deadlocks, and therefore permits maximal runtime concurrency. Additonal publications on the prototype system have appeared in top fora in DCT, systems, and programming languages: WODES’08, OSDI’08. The OSDI paper is attached as well. This research is a very nice combination of rigor and novelty. The authors have a working prototype and have applied it to real software including Apache, BIND, and OpenLDAP; it works as advertised, with very tolerable performance overhead. This approach potentially makes a significant contribution to enhancing the programmability of multicore processors for the mainstream.
Generic Discrimination: Sorting and Partitioning Unshared Data in Linear Time
Author: Fritz Henglein
Venue: ICFP 2008
Everyone knows that sorting a collection of n elements requires a number of comparisons proportional to n log n. Distribution sort over certain kinds of data works in time linear in the number of elements, but is a “specialty” sort, useful only in special circumstances. After reading this paper, you may reverse this idea. Henglein shows how to generalize distribution sort to a wide range of data. Assuming a base type is suitable for distribution sort, he shows how products, sums, and lists of such data are also suitable for distribution sort, relegating n log n sorts to the “specialty” case. Henglein argues convincingly that the right interface for an abstract data type to supply is not comparison and not sorting, but a mild generalization of sorting which he calls a discriminator, in terms of which the other two may be defined. While the technique may be applied in any programming language, it fits particularly well with a functional language where it is easy to compose sort operators on component types to build sort operators on structures, and where advanced type systems allow one to ensure type safety while doing so. Henglein presents complete code in Haskell. We should expect the technique to spread to other languages; the new notion of C++ concepts, which resemble Haskell type classes, are well suited for supporting defining functions based on type structure. The paper is a joy to read.
Some readers may be confused by a line in the abstract and by some of the material in Section 8.1. Clarifications for these appear below. The line in the abstract “Having only a binray equality test on a type requires Θ(n2) time to find all the occurrences of an element in a list of length n, for each element in the list” may confuse some. It may be clearer to say, “Having only a binary equality test on a type requires quadratic time to partition a list into groups of equal elements”.
Table 1 in Section 8.1 may confuse some, because it gives times on the assumption that comparison of two elements of size |v| and
- |w|* requires time proportional to |v|+|w|, whereas it is much more commonfor comparing elements to require time proportional to min(|v|,|w|). For example, lexical comparison of two strings of length |v| and |w| requires the latter time, while reversing two strings (a linear-time operation on linked lists) and then performing a lexical comparison requires the former time. The latter operation is inefficient – it would be better to reverse each string once, then sort using lexical comparison. But common libraries based on comparison operators may seduce a programmer into including reversal into the comparison operator, and the whole point of Table 1 is to point out the cost of such seduction.
Goldilocks: A Race and Transaction-Aware Java Runtime
Authors: Tayfun Elmas, Shaz Qadeer, Serdar Tasiran
Venue: PLDI 2007
The paper describes a precise and reasonably fast data-race-detection algorithm. Their implementation takes the form of a modified JVM that detects all data races as defined by the Java memory model, and throws an exception when a race is encountered. It also supports transactional memory. As far as I know, this is the first such implementation that is fast enough to at least raise the question of whether this could be done routinely as part of production code execution. Although the performance measurements in the paper are not such that we can declare this a solved problem, I think this paper points at an interesting possible solution path to two hard (and, thanks to the widespread adoption of multicore processors) critical parallel programming problems: 1. Dealing with data races is the most difficult aspect of parallel program debugging. Since such bugs are usually not easily reproducible, they may escape testing, and may go unreported for long periods of time. Debugging may require months, since the actual data race is often no longer visible when a failure results, requiring the problem be reproduced repeatedly with increasingly instrumented programs. An exception at the time of the race avoids this problem. 2. (Not pointed out in the paper, since I think nobody appreciated it at the time.) It has become increasingly clear that we do not know how to fully specify the semantics of a multithreaded language like Java that, for security reasons, must specify some semantics for data races. The Java memory model (POPL 05) is the most successful attempt, but it has recently become clear that it too has some surprising and undesirable properties. If we could requrie that data races produce exceptions, these issues disappear. I believe there is now some ongoing work that looks at hardware support for data-race detection, that might greatly reduce the performance costs, and eventually make the “data-race as exception” model viable for a much larger class of applications. This paper is a great first step toward something that might actually result in a major advance in shared memory parallel programming, whether it uses transactional memory, or continues to be based on locks.
Published as Goldilocks: A Race-Aware Java Runtime in November 2010 CACM Research Highlight, with Technical Perspective: Data Races are Evil with No Exceptions by Sarita Adve.
Sound, Complete and Scalable Path-Sensitive Analysis
Author: Isil Dillig, Thomas Dillig, Alex Aiken
Venue: PLDI 2008
This paper appeared in PLDI 2008 and was coauthored by Alex Aiken and two of his Ph.D. students, Isil Dillig and Thomas Dillig. It provides an elegant and scalable solution to a decades-old challenge in the area of program analysis viz., how to perform precise intra- and inter-procedural path-sensitive program analysis without incurring the exponential increase in space and time that arises in a naive code duplication approach? And how to perform such an analysis for recursive programs? They introduce a new approach based on constraint systems to address this problem by defining constraints that capture the exact path- and context-sensitive condition under which a program property holds. The resulting constraint system may not be solvable in general. However, to answer a “may” or a “must” query regarding a program property, they can extract necessary or sufficient conditions from the constraint system which can be solved i.e., they specialize the exact path condition to the query being asked. Further, these extracted necessary/sufficient conditions are typically small enough to make the approach scalable to large software – they have implemented this approach in the Saturn system and used it to analyze over 6MLOC in the entire Linux kernel. They also use “unobservable variables” to model interactions with the environment (such as user input) as well as to support modular program analysis. Their first set of experimental results quantify the size of necessary and sufficient conditions for pointer operations. Their second set of results applies their approach to identifying null-dereference errors. Both experiments were performed on OpenSSH, Samba, and the Linux kernel. The results for null-dereference analysis in Figure 7 compare inter- and intra-procedural path-sensitive analysis with just intra-procedural path-sensitive analysis, and shows that the former results in order-of-magnitude reductions in false positive reports compared to the latter. Like my other nominations, I believe that the authors can do a good job of rewriting the content of this paper for the general CACM audience. Some of the formalism in the current version of the paper may not be accessible to a large audience, and will need to be toned down in the CACM version. However, the basic concepts of inter- and intra-procedural path-sensitive analysis can be conveyed by simple examples such as the queryUser() example discussed in Section
- This paper is also a good illustration of how far approaches such as constraint systems have progressed for program analysis tools.
Published as Reasoning About the Unknown in Static Analysis in August 2010 CACM Research Highlight, with Technical Perspective: Large-Scale Sound and Precise Program Analysis by Fritz Henglein.
Formal Certification of a Compiler Back-end or: Programming a Compiler with a Proof Assistant
Author: Xavier Leroy
Venue: POPL 2006
In 2003, JACM published Tony Hoare’s grand challenge paper on The Verifying Compiler. A verifying compiler guarantees correctness of a program before running it. Meeting such a challenge would mean that researchers have to make great progress on both how compilers are written and on verification technology. Leroy’s paper presents the first *formally-verified* proof of the correctness of a *complete compilation chain*. “Formally verified” means that the entire proof is verified by the Coq proof assistant. “Complete compilation chain” means that, instead of verifying an isolated transformation or optimisation, Leroy has verified the whole chain of a lightly optimising compiler from a (deliberately modest) high level language called CMinor to PowerPC machine code, by way of four different intermediate languages. The paper shows how to certify that when the compiler generates code, then safety properties proved on the source program will hold for the executable code as well. We believe many researchers and practitioners will be interested in a glimpse of the progress on verifying software we all depend on.
Published as Formal Verification of a Realistic Compiler in July 2009 CACM Research Highlight, with Technical Perspective: A Compiler’s Story by Greg Morrisett.
The Next 700 Data Description Languages
Authors: Kathleen Fisher, Yitzhak Mandelbaum, David Walker
Venue: POPL 2006
Network administrators, financial analysts, scientists, and many others represent data in a myriad of ad hoc formats such as variable-width ASCII records and data-dependent number of fixed-width binary records. Those formats typically don’t have parsing, querying, analysis, or transformation tools readily available. Thus motivated, researchers (including Fisher et al) have developed a variety of ad hoc data description languages. What is still missing, though, is a semantics and meta-theory for such languages, or anything to play the role the lambda calculus plays in underpinning programming language design and implementation. That is the gap that this paper fills, by defining a core calculus called the Data Dependency Calculus, and giving it a formal semantics and meta-theory. The paper demonstrates a formal translation from one data description language (IPADS) into DDC, and illustrates how features from other languages might similarly be translated. This is the first time anyone has tried to give a formal treatment to the wild and woolly world of data description languages. We believe many researchers and practitioners will be interested in catching up with what programming language technology can offer for solving common real-world problems related to ad hoc data.
Authors: William N. Scherer III, Doug Lea, Michael L. Scott
Venue: PPoPP 2006
Describes “synchronous queues”, a fundamental data structure used to implement hand-offs or “rendezvous” between threads. This implementation was immediately integrated into the java.util.concurrent component of Sun’s standard Java library, and has thus already had practical impact. The data structure is commonly used with the Java Executor framework and, as is shown by measurements in the paper, may rather spectacularly impact the performance of some applications using it. This is in some sense the canonical data structure that involves blocking or waiting, and for which conventional sequential performance metrics are hence inapplicable. It’s a great example of how to design a data structure, for which, in a sense, only concurrency considerations matter. We believe this will expose readers to an area of increasing importance that is very different from the kind of algorithmic problems most of them will be familiar with. We have confidence that the authors will be able to retarget the paper at a much broader audience than that targeted by the current paper.
Published as Scalable Synchronous Queues in May 2009 CACM Research Highlight, with Technical Perspective: Highly Concurrent Data Structures by Maurice Herlihy.
Java Takes Flight: Time-portable Real-time Programming with Exotasks
Authors: Joshua Auerbach, David F. Bacon, Daniel T. Iercan, Christoph M. Kirsch, V. T. Rajan, Harald Roeck, Rainer Trummer
Venue: LCTES 2007
Java and real-time are terms that aren’t often used together. This paper changes that. Real-time systems programming involves low levels of abstraction, making real-time systems difficult to test and requiring them to be re-certified every time the software or hardware environment changes. This paper introduces Exotasks to Java to raise the level of abstraction. Exotasks add deterministic timing, even in the presence of other Java threads and across changes of hardware and software platform. The authors built a quad-rotor model helicopter, the JAviator, to test the concepts of the work. They tested the system with actual flights. When the system fails, the helicopter crashes! Beyond the richness of the technical details, this paper should appeal to CACM readers. There is a real problem, clever language solutions, and an interesting application demonstration complete with photographs. The work is conducted within the broader context of IBM’s effort to establish a common Java-based real time programming environment.
Fault-Tolerant Typed Assembly Language
Authors: Frances Perry, Lester Mackey, George A. Reis, Jay Ligattii, David I. August, David Walker
Venue: PLDI 2007
This paper was one of two best papers from PLDI 2007. It spans the areas of fault-tolerant computing, type systems, and production-strength compilers to provide a unique approach to formalization and detection of Single Event Upset (SEU) faults, motivated by the increasing number of transient hardware faults observed in real systems. Their approach builds on the general strategy of using redundancy for fault tolerance by executing two independent copies of the program in two separate threads. Prior to writing data out to a memory-mapped output device, the results of the two computations are checked for equivalence. (This research focuses on fault detection in processors, and assumes that other techniques such as error-correcting codes are used to achieve fault tolerance in memories.) They introduce a Fault-Tolerant Typed Assembly Language, TALFT, that ensures that all well-typed programs exhibit fail-safe behavior in the presence of transient faults i.e., a corrupt value can never be written to a memory-mapped output device. TALFT can help identify cases when conventional optimizations may not be sound from the viewpoint of fault tolerance. Finally, they implemented their reliability transformations in the Princeton VELOCITY compiler, and provided empirical results for 22 SPECint2000 and MediaBench benchmarks. These results show that the overhead of fault tolerance is about 34% on average instead of the 100% that may have been expected from dual redundancy. I especially like this paper for CACM because it spans a number of different areas, and I believe that the authors can do a good job of rewriting the content of this paper for the general CACM audience. The basic single-event-upset fault model is well motivated by current technology trends and yet simple to understand. Overall, the paper should be an eye-opener to folks who equate compilers with the Dragon book and haven’t been keeping up with the latest research in compiler technology such as the fault tolerance application studied in this paper.
Dynamic Multigrained Parallelization on the Cell Broadband Engine
Authors: Filip Blagojevic, Dimitris S. Nikolopoulos, Alexandros Stamatakis, Christos D. Antonopoulos
Venue: PPoPP 2007
With the advent of multicores, parallel programming is again one of the most important issues in computer science. This paper describes and evaluates a scheduling strategy for parallel programs executing on an IBM cell processor. It exploits coarse grain parallelism whenever there is an adequate supply of tasks. Otherwise, it attempts to exploit fine grain-parallelism. Fine grain parallelism is given lower priority because it tends to incur more overhead than coarse-grain parallelism. The target machine has an interesting, much talked about architecture. It is a heterogeneous machine, a likely characteristic of future multicores. The application is representative of an important class of applications that contain relatively simple parallelism at multiple levels. The paper describes an excellent case study in parallel programming that illustrates some of the immense difficulties that programmers are likely to face in the brave new world of multicore systems.
This paper was nominated by an ad hoc SIGPLAN Committee prior to the creation of the SIGPLAN CACM Research Highlights Nominating Committee.
Ur/Web: A Simple Model for Programming the Web
Author: Adam Chlipala
Venue: POPL 2015
Published as Ur/Web: A Simple Model for Programming the Web in June 2016 CACM Research Highlight, with Technical Perspective: Why Didn’t I Think of That? by Philip Wadler.
Verifying Quantitative Reliability for Programs that Execute on Unreliable Hardware
Authors: Michael Carbin, Sasa Misailovic, Martin C. Rinard
Venue: OOPSLA 2013
Published as Verifying Quantitative Reliability for Programs that Execute on Unreliable Hardware in June 2016 CACM Research Highlight, with Technical Perspective: Toward Reliable Programming for Unreliable Hardware by Todd Millstein.
Authors: Eric Schkufza, Rahul Sharma, Alex Aiken
Venue: ASPLOS 2013
Published as Stochastic Program Optimization in February 2016 CACM Research Highlight, with Technical Perspective: Program Synthesis Using Stochastic Techniques by Sumit Gulwani.
ELI: Bare-Metal Performance for I/O Virtualization
Authors: Nadav Amit, Abel Gordon, Nadav Har'El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, Dan Tsafrir
Venue: ASPLOS 2012
Published as Bare-Metal Performance for Virtual Machines with Exitless Interrupts in January 2016 CACM Research Highlight, with Technical Perspective: High-Performance Virtualization: Are We Done? by Steve Hand.
Cache and I/O Efficient Functional Algorithms
Authors: Guy E. Blelloch, Robert Harper
Venue: POPL 2013
Published as Cache Efficient Functional Algorithms in July 2015 CACM Research Highlight, with Technical Perspective: The Simplicity of Cache Efficient Functional Algorithms by William D. Clinger.
Continuity Analysis of Programs
Authors: Swarat Chaudhuri, Sumit Gulwani, Roberto Lublinerman
Venue: POPL 2010
Published as Continuity and Robustness of Programs in August 2012 CACM Research Highlight, with Technical Perspective: Proving Programs Continuous by Andreas Zeller.
Authors: David Van Horn, Matthew Might
Venue: ICFP 2010
Published as Abstracting Abstract Machines: A Systematic Approach to Higher-Order Program Analysis in September 2011 CACM Research Highlight, with Technical Perspective: Abstracting Abstract Machines by Olivier Danvy and Jan Midtgaard.
The Semantics of x86-CC Multiprocessor Machine Code (POPL 2009) and A Better x86 Memory Model: x86-TS (TPHOLs 2009)
Authors: Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, Magnus O. Myreen
Venue: POPL 2009 and TPHOLs 2009
Published as x86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors in July 2010 CACM Research Highlight, with Technical Perspective: A Solid Foundation for x86 Shared Memory by Hans-J. Boehm.
Optimistic Parallelism Requires Abstractions
Authors: Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, L. Paul Chew
Venue: PLDI 2007
Published as Optimistic Parallelism Requires Abstractions in September 2009 CACM Research Highlight, with Technical Perspective: Abstraction for Parallelism by Katherine Yelick.
Exterminator: Automatically Correcting Memory Errors with High Probability
Authors: Gene Novark, Emery D. Berger, Benjamin G. Zorn
Venue: PLDI 2007
Published as Exterminator: Automatically Correcting Memory Errors with High Probability in December 2008 CACM Research Highlight, with Technical Perspective: Patching Program Errors by Martin C. Rinard.
Composable Memory Transactions
Authors: Tim Harris, Simon Marlow, Simon Peyton-Jones, Maurice Herlihy
Venue: PPoPP 2005
Published as Composable Memory Transactions in August 2008 CACM Research Highlight, with Technical Perspective: Transactions are Tomorrow’s Loads and Stores by Nir Shavit.
The DaCapo Benchmarks: Java Benchmarking Development and Analysis
Authors: Stephen M. Blackburn, Kathryn S. McKinley, Robin Garner, Chris Hoffmann, Asjad M. Khan, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar
Venue: OOPSLA 2006
Published as Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century in August 2008 CACM Research Highlight, with Technical Perspective: A Methodology for Evaluating Computer System Performance by William Pugh.