SOAP 2020: Proceedings of the 9th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis

Full Citation in the ACM Digital Library

SESSION: Keynote

Formal reasoning and the hacker way (keynote)

In 2013 I moved from to industry after over 25 years in academia, when Facebook acquired a verification startup, Monoidics, that I was involved with. In this talk I’ll recount the clash of cultures I encountered, where traditionally calm and cool formal reasoning techniques came in contact with a heated software development methodology based on rapid modification of large codebases (thousands of modifications per day on 10s MLOC). I will tell how we found that static formal reasoning could thrive, if certain technical approaches (based on compositionality), how the industrial experience caused me to question some of the assumptions I learned in academic static analysis, and how I’ve come out the other side with new science spurred by that experience (most recently, incorrectness logic). Overall, I hope to convey that having science and engineering playing off one another in a tight feedback loop is possible, even advantageous, when practicing static analysis in industry at present.

SESSION: Papers

TACAI: an intermediate representation based on abstract interpretation

Most Java static analysis frameworks provide an intermediate presentation (IR) of Java Bytecode to facilitate the development of static analyses. While such IRs are often based on three-address code, the transformation itself is a great opportunity to apply optimizations to the transformed code, such as constant propagation.

In this paper, we propose TACAI, a refinable IR that is based on abstract interpretation results of a method's bytecode. Exchanging the underlying abstract interpretation domains enables the creation of various IRs of different precision levels. Our evaluation shows that TACAI can be efficiently computed and provides slightly more precise receiver-type information than Soot's Shimple representation. Furthermore, we show how exchanging the underlying abstract domains directly impacts the generated IR.

Value and allocation sensitivity in static Python analyses

Sound static analyses for large subsets of static programming languages such as C are now widespread. For example the Astrée static analyzer soundly overapproximates the behavior of C programs that do not contain any dynamic code loading, longjmp statements nor recursive functions. The sound and precise analysis of widely used dynamic programming languages like JavaScript and Python remains a challenge. This paper examines the variation of static analyses of Python – in precision, time and memory usage – by adapting three parameters: the value sensitivity, the allocation sensitivity and the activation of an abstract garbage collector . It is not clear yet which level of sensitivity constitutes a sweet spot in terms of precision versus efficiency to achieve a meaningful Python analysis. We thus perform an experimental evaluation using a prototype static analyzer on benchmarks a few thousand lines long. Key findings are: the value analysis does not improve the precision over type-related alarms; the value analysis is three times costlier than the type analysis; the allocation sensitivity depends on the value sensitivity; using an abstract garbage collector lowers memory usage and running times, but does not affect precision.

Explaining bug provenance with trace witnesses

Bug finders are mainstream tools used during software development that significantly improve the productivity of software engineers and lower maintenance costs. These tools search for software anomalies by scrutinising the program's code using static program analysis techniques, i.e., without executing the code. However, current bug finders do not explain why bugs were found, primarily due to coarse-grain abstractions that abstract away large portions of the operational semantics of programming languages. To further improve the utility of bug finders, it is paramount to explain reported bugs to the end-users.

In this work, we devise a new technique that produces a program trace for a reported bug giving insight into the root cause for the reported bug. For the generation of the program trace, we use an abstracted flow-based semantics for programs to overcome the undecidability of the problem. We simplify the semantic problem by mapping an input program with a reported bug to a Constant Copy Machine (CCM) for the trace construction. Using CCM the semantics of the program can be weakened, and thus bug provenance can be solved in polynomial time, producing a shortest trace in the process which gives the shortest explanation. The technique is reified in the bug tracing tool Digger and is evaluated on several open-source Java programs.