Workshop on Research Highlights in Programming Languages

Workshop

09/11/2020: The schedule for the workshop is up. See the navigation tabs above. The workshop will take place online concurrent with the main FSTTCS conference.

 

The Indian Association for Research in Computing Science (IARCS) announces the inaugural Workshop on Research Highlights in Programming Languages. The workshop will take place online over 3 days concurrent with the main FSTTCS conference scheduled to be held Dec. 15 – Dec. 17 2020. The focus of the workshop will be on all areas of Programming Languages, including but not limited to program analysis and verification, applied formal methods, and compilers.

The objective of starting this workshop is to foster interactions between the attendees of the workshop, and more broadly between researchers working on programming languages and the traditional FSTTCS community of researchers working on theoretical computer science and formal methods. In the long term we hope that this will evolve into a regular conference or a track integrated with the FSTTCS conference.

 

Call for proposals:

We solicit talk proposals for recent work that has been published in good venues, or is mature in terms of approach and evaluation. The duration of each talk will be around 30 minutes. The proposal should highlight the novel research contribution, and its relevance to the target community.

Submit your proposals here: Submissions are now closed

Submission deadline : Wednesday 30 September 2020 (AoE)

 

Attending the Workshop:

There is no separate registration fees for the workshop. It will be open to all the attendees of FSTTCS.

 

Organizers:

Deepak D’Souza (Indian Institute of Science)

Uday P. Khedkar (Indian Institute of Technology – Bombay)

K. Narayan Kumar (Chennai Mathematical Institute)

K.V. Raghavan (Indian Institute of Science)

Aseem Rastogi (Microsoft Research, India)

 

Contact:

Please contact Raghavan (raghavan@iisc.ac.in) or Aseem (aseemr@microsoft.com) if you have any questions.

Schedule 15th Dec. 2020

Session 1: Verification, Session chair: Uday Khedker

 

3:30pm – 4:00pm        Verification of Math Libraries, Rahul Sharma, Microsoft Research India

Math libraries provide heavily optimized implementations of transcendental functions like exp, log, sin, etc. Since the typical implementations found in math.h/libm operate over a finite number of bits, they are necessarily approximate. Moreover, humans and superoptimizers generate even faster math library functions that tradeoff precision for performance. We describe an automated technique that bounds the deviation between a given math library implementation from the ideal result.

4:00pm – 4:30pm        Verifying Band Convergence for Sampled Control Systems, Ezudheen P, IISc Bangalore

We present a method to verify transient and settling time properties, called band convergence properties, of digitally controlled continuous systems, wherein we consider a linear dynamical system model for a plant and a PID controller. We consider the discrete-time sampled behaviour of the closed loop system, and verify band convergence for the discrete-time behavior. The basic idea is to look for a box-shaped invariant for the system which is adequate to ensure that the system stays within the given band. We first give a technique to handle a general discrete-time system, but with determinate matrix entries. We then give a technique to handle discrete-time systems with matrices that lie in a range which over-approximate the matrix exponential (which arise when we consider the discrete-time version of a continuous system), using the notion of an abstract discrete-time system. We have implemented the verification approach, and evaluate its efficacy on some popular Simulink models.

4:30pm – 5:00pm        A Simulator for State Chart Based Language, Advait Lonkar, IIIT Bangalore

Specification models provide the benefit of automatic verification, where the defects in the software can be detected early in the development cycle. Statechart is a visual formalism for such specification models with complex life-cycles. Statechart formalism ensures expressivity while maintaining simplicity, making it widely accepted in the industry. Statechart Based Language (StaBL) is a specification language which allows modeling of statecharts. The statecharts modeled by StaBL are an extension of Harel Statecharts, with local variables and action language. Similar to other state-based formalisms, StaBL based statecharts are also susceptible to various specification issues. Early identification of such specification issues would not only be desirable but would also aid effective debugging. As statecharts can often evolve to be large (contain hundreds of states, variables, with multiple levels of hierarchy, etc), the task of manual identification of specification issues would be tedious and error-prone. We have built a simulator that captures the configuration of a statechart (modeled in StaBL) for each step, with the ability to maintain the value environments for the three types of variables provided by StaBL(local, parameter, and static). The implementation follows the work by Chakrabarti et. al. The simulation can proceed in three different modes: pre-event in steps, pre-event one-shot, and user-event in steps. The tool can run automated-tests that will identify points of various specification issues. This is especially useful in cases where a smaller sub-set of input (from an otherwise very large set) is responsible for inducing the issue. In this talk, we discuss the main design and architectural details of our tool, and our results on test specifications involving the different modes of execution. We also illustrate how the simulator can be used in the identification of the particular specification issue of non-determinism in a multi-level hierarchical configuration.

Session 2: Static Analysis, Session chair: K.V. Raghavan

 

5:15pm – 5:45pm        New Results in May Happen in Parallel (MHP) Analysis, V. Krishna Nandivada, IIT Madras

May-Happen-in-Parallel (MHP) analysis is becoming the backbone of many of the parallel analyses and optimizations. In this talk, we present new approaches to do MHP analysis for X10-like languages that support async-finish-atomic parallelism. We present a fast incremental MHP algorithm to derive all the statements that may run in parallel with a given statement. We extend the MHP algorithm of Agarwal et al. (answers if two given X10 statements may run in parallel, and under what condition) to improve the computational complexity, without compromising on the precision. We also present algorithms to efficiently compute MHP analysis on the fly, when the program is changing, say in the context of IDEs.

5:45pm – 6:15pm        Generalized Points-to Graphs: A Precise and Scalable Abstraction for Points-to Analysis, Pritam Gharat, Imperial College UK

Computing precise (fully flow- and context-sensitive) and exhaustive (as against demand-driven) points- to information is known to be expensive. Top-down approaches require repeated analysis of a procedure for separate contexts. Bottom-up approaches need to model unknown pointees accessed indirectly through pointers that may be defined in the callers and hence do not scale while preserving precision. Therefore, most approaches to precise points-to analysis begin with a scalable but imprecise method and then seek to increase its precision. In contrast, we begin with a precise method and increase its scalability. We create naive but possibly non-scalable procedure summaries and then use novel optimizations to compact them while retaining their soundness and precision. We propose a novel abstraction called the generalized points-to graph (GPG), which views points-to relations as memory updates and generalizes them using the counts of indirection levels leaving the unknown pointees implicit. This allows us to construct GPGs as compact representations of bottom- up procedure summaries in terms of memory updates and control flow between them. Their compactness is ensured by strength reduction (reducing the indirection levels), control flow minimization (eliminating control flow edges while preserving soundness and precision), and call inlining (enhancing the opportunities of these optimizations). The effectiveness of GPGs lies in the fact that they discard as much control flow as possible without losing precision. As a result GPGs are very small even for main procedures that contain the effect of the entire program. This allows our implementation to scale to 158 kLoC for C programs. At a more general level, GPGs provide a convenient abstraction to represent and transform memory in the presence of pointers.

6:15pm – 6:45pm        Bidirectionality in flow-sensitive demand-driven analysis, Swati Jaiswal, VNIT Nagpur

Demand-driven methods for program analysis have primarily been viewed as efficient algorithms for computing the same information as the corresponding exhaustive methods, but for a given set of demands. We found this counter intuitive. Hence we tried to connect a known fact (“demand-driven methods compute less information”) with a self-evident intuition (“imprecision caused by abstraction should increase with the amount of data abstracted”). This investigation led to our contribution of designing demand-driven flow-sensitive alias analysis (which we call PDFSA). PDFSA is not only efficient but also more precise as compared to exhaustive flow-sensitive alias analysis (EFSA). We formalize PDFSA using an inherent property of a demand-driven flow-sensitive alias analysis: demands are propagated against the control flow and aliases are propagated along the control flow. Traditionally, this has been seen as a collection of two separate analyses whose interaction is controlled by an algorithm that drives the two analyses. We formalize this algorithmic view as a bidirectional data flow analysis to define PDFSA declaratively. Further, we define Meet Over Paths (MoP) solution for bidirectional flows for reasoning about the soundness of PDFSA. Our definition generalizes the classical definition of MoP which is restricted to unidirectional flows. We have implemented PDFSA, existing demand-driven alias analysis (which we call ADFSA), and EFSA for static resolution of virtual function calls in C++ for constructing more precise call graphs. Our measurements show that the call graphs computed using PDFSA are indeed more precise than those that are computed using ADFSA or EFSA.

6:45pm – 7:15pm        Efficient Fixpoint Computation for Abstract Interpretation, Aditya Thakur, UC Davis, USA

Abstract interpretation is a general framework for expressing static program analyses. It reduces the problem of extracting properties of a program to computing an approximation of the least fixpoint of a system of equations. The de facto approach for computing this approximation uses a Bourdoncle’s algorithm based on weak topological order (WTO). This talk describes recent work on improving the time and memory performance of abstract interpretation without sacrificing precision. First, we present a deterministic parallel algorithm for fixpoint computation by introducing the notion of weak partial order (WPO), which generalizes the notion of WTO. We present an almost-linear time algorithm for computing a WPO. Second, we present a technique for memory-efficient fixpoint computation. Both techniques are proved to be optimal while computing the same result as Bourdoncle’s approach. The talk will also describe the implementation and evaluation of these techniques in the NASA IKOS and Facebook SPARTA abstract interpreters.

Schedule 16th Dec. 2020

Session 1: Program Synthesis/Machine Learning, Session chair: Deepak D’Souza

 

3:30pm – 4:00pm        Models and Programs: Better Together, Sriram Rajamani, Microsoft Research India

Over the past decade, we have seen ubiquitous adoption of Machine Learning (ML) models across many different application domains. In this talk, we explore the connections between ML models and programs, and argue that there are significant advantages in combining ML models with programmatic representations. We present results from our work across many different Machine Learning perspectives in support of this hypothesis, including Supervised Learning, Bayesian Learning and Reinforcement Learning. In each case, we show advantages in constructing human readable and interpretable programmatic representations of ML models, and in combining tools and techniques from program analysis and program synthesis with machine learning. We conclude with opportunities in using programming language techniques to make ML models efficient, robust, interpretable, and verifiable.

4:00pm – 4:30pm        Quantified Invariants via Syntax-Guided Synthesis, Sumanth Prabhu S, TCS Research Pune and IISc Bangalore

Automated reasoning about programs with arrays is an important problem as arrays are ubiquitous data-structure. However, this often requires invariants that are quantified universally over ranges of array elements. Such invariants are difficult to find and difficult to prove inductive. In this talk, I present our recent algorithm to generate universally quantified invariants, which is based on enumerative search.  The algorithm discovers the invariant in three stages. First, it discovers a range for the invariant by exploiting program syntax. Then, it identifies useful properties about individual elements of arrays and generalizes to the entire range to get candidate invariants. Finally, it filters the wrong candidates by using heuristics and SMT solver. The implementation of the algorithm advances state-of-the-art on a wide range of array-handling programs.

4:30pm – 5:00pm        Resilient Abstraction-Based Controller Design, Stanly Samuel, IISc Bangalore

We consider the computation of resilient controllers for perturbed non-linear dynamical systems w.r.t. linear-time temporal logic specifications. We address this problem through the paradigm of Abstraction-Based Controller Design (ABCD) where a finite state abstraction of the perturbed system dynamics is constructed and utilized for controller synthesis. In this context, our contribution is twofold: (I) We construct abstractions which model the impact of occasional high disturbance spikes on the system via so called disturbance edges. (II) We show that the application of resilient reactive synthesis techniques to these abstract models results in closed loop systems which are optimally resilient to these occasional high disturbance spikes. We have implemented this resilient ABCD workflow on top of SCOTS and showcase our method through multiple robot planning examples.

5:00pm – 5:30pm        IR2Vec: LLVM IR based Scalable Program Embeddings, S. Venkata Keerthy, IIT Hyderabad

In order to overlook the complexities arising out of NP-Completeness, modern compilers use expert-determined heuristics for most of the optimizations. As such heuristics are not designed to give optimal results for all the scenarios, there is an increased scope for machine learning approaches to solve such problems in order to obtain better results. In order to design such a solution, it is important to represent programs in a form that is amenable to learning.

In this work, we propose IR2Vec, a Concise and Scalable encoding infrastructure to represent programs as a distributed embedding in continuous space. It is obtained by combining representation learning methods with flow information to capture the syntax as well as the semantics of the programs. As our infrastructure is based on the Intermediate Representation (IR) of the source code, obtained embeddings are both language and machine independent. The entities of the IR are modeled as relationships, and their representations are learned to form a seed embedding vocabulary. Using this infrastructure, we propose two incremental encodings: Symbolic and Flow-Aware. Symbolic encodings are obtained from the seed embedding vocabulary, and Flow-Aware encodings are obtained by augmenting the Symbolic encodings with the flow information.

We show the effectiveness of our methodology on two optimization tasks (Heterogeneous device mapping and Thread coarsening). Our way of representing the programs enables us to use non-sequential models resulting in orders of magnitude of faster training time. The encodings generated by IR2Vec outperform the existing methods in both the tasks. In particular, our results improve or match the state-of-the-art speedup in 11/14 benchmark-suites in the device mapping task across two platforms and 53/68 benchmarks in the Thread coarsening task across four different platforms. When compared to the other methods, our embeddings are more scalable, is non-data-hungry, and has better Out-Of-Vocabulary characteristics.

Session 2: Software Faults, Testing, and Re-engineering, Session chair: Akash Lal

 

5:45pm – 6:15pm        Diagnosing Software Faults Using Multiverse Analysis, Prantik Chatterjee, IIT Kanpur

Spectrum-based Fault Localization (SFL) approaches aim to efficiently localize faulty components from examining program behavior. This is done by collecting the execution patterns of various combinations of components and the corresponding outcomes into a spectrum. Efficient fault localization depends heavily on the quality of the spectra. Previous approaches, including the current state-of-the-art Density- Diversity-Uniqueness (DDU) approach, attempt to generate “good” test-suites by improving certain structural properties of the spectra. In this work, we propose a different approach, Multiverse Analysis, that considers multiple hypothetical universes, each corresponding to a scenario where one of the components is assumed to be faulty, to generate a spectrum that attempts to reduce the expected worst-case wasted effort over all the universes. Our experiments show that the Multiverse Analysis not just improves the efficiency of fault localization but also achieves better coverage and generates smaller test-suites over DDU, the current state-of-the-art technique. On average, our approach reduces the developer effort over DDU by over 16% for more than 92% of the instances. Further, the improvements over DDU are indeed statistically significant on the paired Wilcoxon Signed-rank test.

6:15pm – 6:45pm        Monolith Decomposition based on Business Functionality Inference, Shivali Agarwal, IBM Research India

The problem of extracting microservices automatically from monolithic applications has seen much interest in recent years. Most of the proposed techniques rely on finding partitions of the source code that align with technical components or with the implicit modules in the existing code structure. While this is acceptable for software modularization and reverse engineering purposes, splitting microservices is better done on the basis of business functionality rather than technical components. In this talk, we will cover a novel approach for monolith decomposition, that maps the implementation structure of a monolith application to a functional structure that in turn can be mapped to business functionality.

We use formal concept analysis, used in literature for feature identification, in a novel way to infer the functional structure of the monolith application. This is preceded with an abstraction of the monolith code into a novel set of code features that capture the functionality relevant code structure and flows, and are largely agnostic of the programming language. These code features are used by the proposed functional decomposition technique. The approach then applies a clustering technique that groups classes together into three types of partitions: 1) candidate microservices, 2) a utility class group, and 3) a group of classes that require significant refactoring to enable a clean microservice architecture. The clustering technique attempts to maximize cohesion and minimize coupling both at the implementation structure level and at the functional structure level. This results in microservice candidates that are naturally aligned with the different business functions exposed by the application, while also exploiting implicit implementation boundaries in the monolithic code.

We have evaluated our approach on multiple Java and .NET applications against three baseline techniques. We will present the results on open benchmark applications like DayTrader and demonstrate the candidate microservice advisor tool released by us for decomposing enterprise legacy applications.

6:45pm – 7:15pm        Flaky Tests: Some Results and Research Challenges, Darko Marinov, UIUC, USA

Testing is the most common approach in practice to check software. Regression testing checks software changes. A key challenge for regression tests is to detect software bugs and fail when a change introduces a bug, signaling to the developer to fix it. However, an emerging challenge is that the tests must also _not_ fail when there is _no_ bug in the change. Unfortunately, some tests, called flaky, can non-deterministically pass or fail on the same software, even when it has no bug. Such flaky tests give false alarms to developers about the existence of bugs. A developer may end up wasting time trying to fix bugs not relevant to the recent changes the developer made. I will present some work done by my group and our collaborators to alleviate the problem of flaky tests. I will also discuss some remaining challenges, especially for researchers in programming languages.

Schedule 17th Dec. 2020

Session 1: Race Detection, Session chair: K. Narayan Kumar

 

2:00pm – 2:30pm        Static Data Race Detection for Android Models and Programs, Abhishek Vijay Uppar, IISc Bangalore

A data race takes place in an execution when when two conflicting accesses to a memory location occur ” in parallel”. Detecting races in Android applications is challenging due to their non-standard concurrent execution model, involving callbacks, looper threads, and posting of tasks to threads. In this work we model Android applications as a class of Event Driven Programs (EDP) characterized by looper threads with FIFO queues, and posting of tasks to these threads. We extend the notion of disjoint blocks introduced in Data races and static-analysis for interrupt-driven kernels by Chopra et al., to propose a static may happen-in-parallel (MHP) analysis for EDP programs. The analysis is based on the key notion of an abstract task graph which models the posting of tasks from one thread to another in an EDP program. We then give a small set of rules based on the abstract task graph of a program, that soundly captures when two statements may not happen-in-parallel.

We have evaluated our analysis manually on fragments of a few Android apps like OpenTracks, Aard2, etc and the precision appears to be good. We are currently in the process of implementing the analysis in the Flowdroid/Soot framework.

2:30pm – 3:00pm        Static Race Detection for Embedded Systems, Rekha Pai, IISc Bangalore

Embedded systems are increasingly employed in safety-critical situations and the presence of races may lead to erroneous behaviour with serious consequences. Detection of races is one step towards proving correctness of programs. Two statements are involved in a data race if they are conflicting accesses to a shared memory location and can happen “simultaneously” or one after another. The problem of detection of races is well studied for classical concurrent systems which has standard synchronization techniques (using lock-unlock) and thread interleaving semantics. Embedded systems typically use non-standard synchronization mechanisms like disabling and enabling interrupts, suspending and resuming scheduler or specific threads, dynamically raising and lowering thread priorities, use of flags, etc. in addition to the standard locks. This brings in complex context switching possibilities which makes it challenging to detect races with high precision.

This talk will outline our recent work aimed at detection of races in non-standard concurrent systems. First, the talk will introduce some non-standard concurrent programming models and the challenges it brings to detection of races. This motivates the new definition of race suitable for such systems. The talk will then focus on the novel idea of disjoint blocks that enables detection of races with high precision.

3:00pm – 3:30pm        LLOV: A Fast Static Data-Race Checker for OpenMP Programs, Utpal Bora, IIT Hyderabad

Data races are common source of bugs in parallel programs. Though frameworks such as OpenMP makes it easier to specify parallelism in programs, detecting races within such programs still remains a challenging task. There have been multiple data race checkers developed for pthread based parallel programs. However, very little progress has been made in designing and developing static race checkers for high level parallel programming languages and APIs such as OpenMP. This is evident from the scarcity of non-commercial or open source static race checkers for OpenMP programs. We developed a language agnostic, static data race checker for detecting data races within OpenMP programs. Our tool, LLOV, is based on polyhedral compilation techniques. Precise dependence analysis of the polyhedral model also facilitates LLOV to prove data race freedom in OpenMP programs. LLOV is designed to work on the Intermediate Representation (IR) of the LLVM Compiler, thus, making it language independent. We have tested it on C, C++, and Fortran programs. Designing a language independent static data-race checker is challenging even in a well-designed compiler framework like LLVM, as the high level semantics and language subtleties of the input languages and the parallelization directives are not structurally preserved in the IR. We overcome these hurdles in LLOV by a three pronged approach: (i) reconstructing OpenMP pragmas by carefully analyzing properties and patterns in the IR, (ii) extending the existing infrastructure of Polly to compute exact dependencies in parallel programs, and (iii) application of the Integer Set Library (ISL) for polyhedral operations. We show that the precision, accuracy, and the F1 score of LLOV on well-established benchmarks are comparable to the state-of-the-art dynamic race checkers while being orders of magnitude faster.

 

Session 2: Security, Session chair: Aseem Rastogi

 

3:45pm – 4:15pm        Ethainter: A Smart Contract Security Analyzer for Composite Vulnerabilities, Yannis Smaragdakis, University of Athens, Greece

Smart contracts on permissionless blockchains are exposed to inherent security risks due to interactions with untrusted entities. Static analyzers are essential for identifying security risks and avoiding millions of dollars worth of damage. We introduce Ethainter, a security analyzer checking information flow with data sanitization in smart contracts. Ethainter identifies composite attacks that involve an escalation of tainted information, through multiple transactions, leading to severe violations. The analysis scales to the entire blockchain, consisting of hundreds of thousands of unique smart contracts, deployed over millions of accounts. Ethainter is more precise than previous approaches, as we confirm by automatic exploit generation (e.g., destroying over 800 contracts on the Ropsten network) and by manual inspection, showing a very high precision of 82.5% valid warnings for end-to-end vulnerabilities. Ethainter’s balance of precision and completeness offers significant advantages over other tools such as Securify, Securify2, and teEther.

4:15pm – 4:45pm        Pifthon: A Flow Secure Analyzer for Imperative Programming Languages, Sandip Ghosal, IIT Bombay

Information flow control (IFC) plays a vital role in controlling dissemination of information among mutually distrusted subjects/objects of a program. For the last two decades, researchers have been building tools/techniques for enforcing information flow security in programming languages or certify a program for a given information flow policy (IFP), which specifies constraints for information propagation. However, the use of existing IFC tools/techniques in real application development or certification is far from reality, due to imprecise flow analysis, difficulty in usability, and unsatisfactory declassification techniques. Precision mainly depends on binding mechanisms of labels with the subjects/objects of a program. Majority of the existing IFC techniques follow either static or dynamic labeling. While static labeling often results in false-positive for program certification, pure dynamic labeling could plausibly declare all programs to be flow secure. We have arrived at a dynamic labeling algorithm (DL) that adopts a hybrid labeling (static+dynamic) approach to analyze programs for flow security. Integrating the DL algorithm with the Readers-Writers Flow Model (RWFM), we have built a compile-time flow analyzer called Pifthon, for a dialect of Python. The flow-analyzer aids in identifying possible misuse of the information in a sequential program corresponding to a given IFP. Pifthon captures forward inter-statement implicit flows and backward flows in loop statements and inherits robust declassification from RWFM that is complete w.r.t. to flow policies. Further, Pifthon has distinct advantages like reduced labeling overhead and flow violation detections, making it quite usable and covers a wide range, including termination and progress-sensitive channels. In this talk, we shall present the labeling algorithm as well as the declassification rule that has been embedded in it, and illustrate the tool on several benchmark programs and provide a comparison with that of existing tools/techniques.

4:45pm – 5:15pm        Verse: an EDSL for Cryptographic Primitives, Abhishek Dang, IIT Kanpur

Despite impressive work on code optimisation in modern compilers, cryptographic primitives are still being written in low-level languages like C and even assembly. While performance is an important consideration, the unpredictable nature of modern optimising compilers for high level languages leave primitives written in them vulnerable to various side channel attacks. However, in this bargain we have lost out on the safety, portability, and maintainability that comes naturally with a high-level language. We present Verse (Verse: An EDSL for cryptographic primitives – In the 20th International Symposium on Principles and Practice of Declarative Programming. ACM, 2018), a DSL embedded inside Coq for writing cryptographic primitives, which addresses some of these shortcomings. Verse is a low-level language for writing cryptographic primitives albeit with a relatively high-level interface. A Verse programmer can generate Verse code using functions written in Gallina, the functional programming language underlying Coq. This meta-programming facility can be seen as assembler macros and is used to give many utility functions for coding patterns like looping, register caching etc. These features of Verse make the programming experience remarkably high-level. To the programmer, Verse appears to be a typed low-level language, with facility for defining and using assembler macros and strong compile time type safety. The primary motivation of embedding inside Coq, a proof assistant, was to prove functional properties of Verse code in the same environment where the code is written. The parametric nature of the Verse AST and heavy use of the Coq sectioning mechanism in writing the code makes our job of giving a clean interface for proofs hard. Since the publication in PPDP ’18, we have provided the facility to add annotatations encoding program properties in Verse code itself. Extraction and presentation of proof obligations from annotated code is now very usable. We are working on a newer yet monoid based framework for semantics. This would pave the way for even more features like inline machine code, modular code and proofs and even apparently recursive code.

We hope to make a case for an embedded language as the approach towards narrow programming domains intent upon security and verification.