eMOP: A Maven Plugin for Evolution-Aware Runtime Verification

eMOP: A Maven Plugin

for Evolution-Aware Runtime Veriﬁcation

Ayaka Yorihiro, Pengyue Jiang, Valeria Marqués,

Benjamin Carleton, Owolabi Legunsen

Cornell University, USA

{ay436, pj257, vmm49, bc534, legunsen}@cornell.edu

Abstract. We present eMOP, a tool for incremental runtime veriﬁca-

tion (RV) of test executions during software evolution. We previously

used RV to ﬁnd hundreds of bugs in open-source projects by monitoring

passing tests against formal speciﬁcations of Java APIs. We also proposed

evolution-aware techniques to reduce RV’s runtime overhead and human

time to inspect speciﬁcation violations. eMOP brings these beneﬁts to

developers in a tool that seamlessly integrates with the Maven build sys-

tem. We describe eMOP’s design, implementation, and usage. We eval-

uate eMOP on 676 versions of 21 projects, including those from our ear-

lier prototypes’ evaluation. eMOP is up to 8.4× faster and shows up to

31.3× fewer violations, compared to running RV from scratch after each

code change. eMOP also does not miss new violations in our evaluation,

and it is open-sourced at https://github.com/SoftEngResearch/emop.

1 Introduction

The prevalence of costly and harmful bugs in deployed software underscores the

need for techniques to help ﬁnd more bugs during testing. Runtime veriﬁcation

(RV) [3, 13, 14, 26, 30, 38, 44] is such a technique; it monitors executions against

formal speciﬁcations and produces violations if a speciﬁcation is not satisﬁed.

We previously used RV to amplify the bug-ﬁnding ability of tests [32,34,40].

We found hundreds of bugs by monitoring passing tests in hundreds of open-

source projects against Java API behavioral speciﬁcations [31]. Such speciﬁ-

cations should not change as client programs evolve. For example, monitoring

the Collections_SynchronizedCollection speciﬁcation [7] revealed several bugs,

e.g., [8, 9]: possible “non-deterministic behavior” caused by not synchronizing

on iterators over Collection.synchronizedCollection()’s output [27]. Develop-

ers conﬁrmed and ﬁxed these and many other bugs that RV helped us ﬁnd.

We also found that RV incurs runtime overhead and requires a lot of human

time to inspect violations. To reduce RV costs, we proposed three evolution-aware

techniques that focus RV and its users on code aﬀected by changes [35, 37]:

(1) Regression Property Selection (RPS) re-checks, in a new code version,

a subset of speciﬁcations that may be violated in code aﬀected by changes.

(2) Violation Message Suppression (VMS) displays new violations—users

are more likely to deal with violations that are related to their changes [41].

(3) Regression Property Prioritization (RPP) monitors important speciﬁ-

cations on developers’ critical path; others are monitored in the background.

2 Yorihiro et al.

These evolution-aware techniques reduce RV costs, but our proof-of-concept pro-

totypes are hard to integrate with open-source projects. Also, these techniques

can be used together but our prototypes do not allow users to easily do so.

We present eMOP, a Maven plugin for incremental RV of tests during soft-

ware evolution. Maven is a popular build system, so eMOP can bring the ben-

eﬁts of RV to a wider audience of developers. eMOP improves RPS and RPP,

re-implements VMS, and allows users to easily combine RPS, VMS, or RPP.

Components in eMOP’s architecture (1) extend Maven’s sureﬁre plugin [48]

to perform analysis before and after monitoring tests; (2) use STARTS [36] to

reason about code changes and ﬁnd classes aﬀected by changes; (3) re-conﬁgure

a JavaMOP [29] Java agent on the ﬂy to select which speciﬁcations to monitor

and where to instrument them; and (4) get ﬁne-grained change information from

Git for computing new violations. Once installed, users only need to change few

Maven conﬁguration lines to start using RPS, VMS, RPP, or their combination.

We evaluate eMOP on 676 versions of 21 projects. On the subset of projects

and their versions that we previously used to evaluate our prototypes, eMOP

produces similar results. Overall, on average, eMOP is up to 8.4× faster (aver-

age: 4.0×) and shows up to 31.3× fewer violations (average: 11.8×), compared

to using JavaMOP to perform RV from scratch after each code change.

We deﬁned an evolution-aware RV technique as safe if it ﬁnds all new vio-

lations after a change, and precise if it ﬁnds only new violations [37]. Also, we

proposed two sets of RPS variants: two variants that are theoretically safe and

ten variants that are not, under the assumptions that we make.

Our prior evaluation [37] showed that all RPS variants (including theoreti-

cally unsafe ones) were empirically safe. But, on projects that we did not pre-

viously evaluate, we initially ﬁnd that our theoretically unsafe RPS variants are

not empirically safe if 3rd-party libraries change. So, to improve RPS safety, we

rerun RV from scratch when libraries change. We ﬁnd that RPS with eMOP is

empirically safe in all projects and versions that we evaluate in this paper.

Table 1: eMOP vs. our early prototypes [37].

Feature Prototype eMOP

Maven integration ✗ ✓

Single-module projects ✓ ✓

Multi-module projects ✗ ✓/✗

RPS ✓ ✓

VMS ✓ ✓

RPP ✓ ✓

RPS + VMS ✗ ✓

RPP + VMS ✗ ✓

RPS + RPP 6 variants 12 variants

RPS + RPP + VMS ✗ ✓

Safe w.r.t. CUT ✓ ✓

Safe w.r.t. 3rd-party lib ✗ ✓

Version comparison jDiﬀ jGit

Ease of conﬁguration low high

Comparison with our previ-

ous prototypes. Table 1 com-

pares eMOP with our origi-

nal prototypes [37]. ✓ means

“supported”; ✗ means “not sup-

ported”, and ✓/✗ means “par-

tially supported”. Unlike eMOP,

our prototypes (1) do not inte-

grate with Maven or work on

multi-module projects; (2) par-

tially support combining RPS

and RPP, but do not combine

RPS or RPP with VMS; (3) are

less safe when 3rd-party libraries

change; (4) use an external tool

to re-obtain change information

eMOP: A Maven Plugin for Evolution-Aware Runtime Veriﬁcation 3

that is already in Git; and (5) are hard to conﬁgure. eMOP will aid future

evolution-aware RV research; it is on GitHub [12], as are our artifacts [11].

2 eMOP

We summarize evolution-aware RV, and our eMOP implementation. Our origi-

nal paper [37] has theoretical background, examples, deﬁnitions, diagrams, etc.

2.1 Evolution-Aware RV Techniques

Regression Property Selection (RPS). The inputs to RPS are the old and

new program versions, and the set of Java API speciﬁcations. The outputs are

aﬀected speciﬁcations that may be violated in code aﬀected by changes. RPS

uses class-level static change-impact analysis [36] to ﬁnd impacted classes that

transitively depend on changed code. Then, RPS analyzes impacted classes to-

gether with all available speciﬁcations, and outputs speciﬁcations involving API

methods that are called in impacted classes.

There are 12 RPS variants that diﬀer in how impacted classes are computed

(three options), and where aﬀected speciﬁcations are instrumented (four op-

tions) [37]. Let ∆ be the set of changed classes. Impacted classes are computed

in three ways—(a) ps

: ∆ and its dependents—classes that use or extend those in

∆; (b) ps

: classes in (a) plus dependees—classes that those in ∆ use or extend;

and (c) ps

: classes in (b) plus dependees of ∆’s dependents. Impacted classes

are always instrumented, but there are two Boolean options (and four ways to

combine them) for whether to instrument unimpacted classes or 3rd-party li-

braries. Theoretically, these variants diﬀer in how much safety they trade oﬀ for

eﬃciency. But, all variants were empirically safe in our original evaluation [37].

Violation Message Suppression (VMS). To reduce human time for inspect-

ing speciﬁcation violations, VMS aims to show only new violations after code

changes. VMS does not reduce RV runtime overhead. The rationale behind VMS

is that developers are more likely to look at and debug new violations, compared

to looking at all old and new violations at the same time [41]. VMS takes the set

of all violations in the new version, ﬁlters out those for which there is evidence

that they are old violations, and presents the rest to the user as new violations.

Regression Property Prioritization (RPP). The goal is to reduce the time

to see important violations, so users may react faster. RPP splits RV into two

phases. Important speciﬁcations, deﬁned by the user, are monitored in the critical

phase and any violations of those speciﬁcations are reported immediately. The

other speciﬁcations are monitored in a background phase that users do not have

to wait for. Users can decide when and how violations from the background

phase are presented or when speciﬁcations should be automatically promoted

(demoted) from (to) the background phase.

2.2 Implementation

We implement 12 RPS variants, VMS, and RPP in our eMOP Maven plugin. We

choose Maven (1) so users can more easily integrate evolution-aware RV, (2) to

4 Yorihiro et al.

.git

Old metadata

Code + tests in

new version

All available

speciﬁcations

New metadata

New violations

All violations

VMS

JavaMOP

mvn emop:vms

CIA

Impacted

classes

Affected

speciﬁcations

RPP

Important

speciﬁcations

Background

speciﬁcations

mvn emop:rppmvn emop:affected-specsmvn emop:rps

mvn emop:impacted

RPS

Fig. 1: eMOP’s architecture.

ease evolution-aware RV usage during testing, and (3) because we built Maven

plugins before [21, 36]. Future work can add eMOP to other build systems.

Figure 1 shows eMOP’s architecture, and how components map to some

available user commands (see §3). There, ovals show processes and rectangles

show data. “Old Metadata” and “New Metadata” contain a per-class mapping

from non-debug related bytecode to checksums computed from the old and new

code versions, respectively, the classpath, and checksums for the jars on the

classpath. eMOP uses STARTS [36] to compute these mappings and classpath

information is used to detect changes in 3rd-party libraries. Also “.git” is Git’s

internal database of changes, which eMOP uses to ﬁnd which violations are

likely new. “CIA” represents our modiﬁed change impact analysis in STARTS;

eMOP uses “CIA” to compute impacted classes in three ways (§2.1). Lastly,

eMOP invokes “JavaMOP” to monitor test executions.

RPS. We invoke the AspectJ compiler, ajc [1], to statically analyze which spec-

iﬁcations are related to impacted classes. JavaMOP speciﬁcations are written in

an AspectJ dialect, so our static analysis outputs, as aﬀected, speciﬁcations that

ajc compiles into any impacted class. To reduce analysis cost, eMOP invokes

ajc on stripped-down speciﬁcations that contain only method-related informa-

tion. Finally, based on ajc’s output, eMOP modiﬁes a JavaMOP agent on the

ﬂy to only monitor aﬀected speciﬁcations and instrument them in locations re-

quired by the RPS variant.

VMS. We re-implement VMS on top of JGit [28] (instead of jDiﬀ). JGit provides

an API for extracting ﬁne-grained information from “.git”. By default, eMOP

takes the most recent Git commit as the old version and the current working

tree as the new version. So, VMS users can check if code changes introduce

new violations before making a commit. Users can also specify what commit to

compare the working tree against, or the ID of any two commits.

In the ﬁrst VMS run, all violations are new. Subsequently, VMS analyzes

all violations from the old version against the code change. If a speciﬁcation is

violated in the same class and on a line that is mapped to the same location in

eMOP: A Maven Plugin for Evolution-Aware Runtime Veriﬁcation 5

both versions, VMS ﬁlters it out as old; the rest are presented as new violations.

VMS users can choose to write new or old violations to the console, a ﬁle on

disk, or both.

RPP. Users can provide a ﬁle containing important speciﬁcations for RPP to

monitor in the critical phase; the rest are monitored in the background. Users

can also provide a ﬁle containing important speciﬁcations and a ﬁle containing

a disjoint set of speciﬁcations to be monitored in the background phase. If users

do not provide a ﬁle, RPP uses the following default scheme. The ﬁrst time,

RPP monitors all speciﬁcations in the critical phase. Only speciﬁcations that

are violated in the ﬁrst run are monitored in the critical phase in the second

run; the rest are monitored in the background. Subsequently, a speciﬁcation

that is violated in the background phase is promoted to the critical phase in

the next run. Users can manually demote speciﬁcations from the critical phase.

Future work can add options to let users specify when a speciﬁcation should be

automatically promoted to or demoted from the critical phase. RPP currently

does not support multi-module projects.

Combinations. eMOP users can combine RPS, VMS, and RPP. When using

all three together, RPS ﬁrst ﬁnds aﬀected speciﬁcations, then RPP splits the

monitoring of aﬀected speciﬁcations into critical and background phases before

VMS shows new violations from RPP’s critical phase. When using VMS with

RPS or RPP, the other techniques are run ﬁrst, then VMS shows new violations.

Running RPS with RPP works in the same order as when combining all three.

3 Installation and Usage

Installing eMOP. eMOP can be installed by following the directions in the

“Installation” section of the README.md ﬁle on eMOP’s GitHub page [12]. Note

that installing eMOP from sources requires satisfying all prerequisites that are

listed on that GitHub page.

Integrating eMOP. To use eMOP in a Maven project, modify that project’s

conﬁguration ﬁle—typically called pom.xml—to add the latest version of eMOP

and the JavaMOP agent argument to the conﬁguration of the Maven sureﬁre

plugin (which runs tests) ﬁle. The current way to add the eMOP to a project

is to modify the pom.xml ﬁle like so:

1 <build><plugins>

2 ...

3 <plugin>

4 <groupId>org.apache.maven.plugins</groupId>

5 <artifactId>maven−sureﬁre−plugin</artifactId>

6 <version>2.20−or−greater</version>

7 <conﬁguration> <argLine>−javaagent:${JavaMOPAgent.jar}</argLine>

8 </conﬁguration>

9 </plugin>

10 <plugin>

11 <groupId>edu.cornell</groupId>

12 <artifactId>emop−maven−plugin</artifactId>

13 <version>${latest_eMOP_version}</version>

14 </plugin>

15 ...

16 </plugins></build>

6 Yorihiro et al.

Using eMOP. These commands allow users to: (1) list impacted classes or

aﬀected speciﬁcations, or (2) run RPS, VMS, RPP, and their combinations:

1 $ mvn emop:help # list all goals (commands)

2 $ mvn emop:impacted # list impacted classes

3 $ mvn emop:aﬀected−specs # list aﬀected speciﬁcations

4 $ mvn emop:rps # run RPS

5 $ mvn emop:rpp # run RPP

6 $ mvn emop:vms # run VMS

7 $ mvn emop:rps−rpp # run RPS+RPP

8 $ mvn emop:rpp−vms # run RPP+VMS

9 $ mvn emop:rps−vms # run RPS+VMS

10 $ mvn emop:rps−rpp−vms # run RPS+RPP+VMS

11 $ mvn emop:clean # delete all metadata

The emop:help command lists all eMOP commands and what they do; the others

are related to evolution-aware RV. Next, we describe some conﬁguration options.

Running RPS. Three ﬂags choose among RPS variants: (1) closureOption

speciﬁes how to compute impacted classes: PS1, PS2, or PS3 (the default: PS3);

(2) includeLibraries controls whether to instrument 3rd-party libraries (de-

fault: true); and (3) includeNonAffected controls whether non-impacted classes

are instrumented (default: true). For example, this command runs RPS and

instruments neither classes that are not impacted nor 3rd-party libraries:

1 $ mvn emop:rps −DincludeLibraries=false −DincludeNonAﬀected=false

Running VMS. Users can see violations on the console or in a violation-counts

ﬁle. By default, only new violations are shown. But, users can view all violations

in the console or in the ﬁle via Boolean showAllInConsole and showAllInFile op-

tions, respectively (default: false). Users can specify commit IDs using lastSha

and newSha. For example, this command shows new violations relative to commit

ID abc123 in the console but it still outputs all violations to ﬁle:

1 $ mvn emop:vms −DlastSha=abc123 −DshowAllInFile=true

Running RPP. Users can provide speciﬁcations to RPP using two options

named criticalSpecsFile and backgroundSpecsFile. If only criticalSpecsFile

is provided, then all other speciﬁcations will be monitored in the background. By

default, RPP tracks metadata for critical and background phase speciﬁcations

as described in §2.1. But RPP also has a demoteCritical option (default: false)

for demoting previously important speciﬁcations that are not violated in the

critical phase of the current run to the background phase for the next run. For

example, this command monitors speciﬁcations in critical.txt (respectively,

background.txt) in the critical (respectively, background) phase:

1 $ mvn emop:rpp −DcriticalSpecsFile=critical.txt −DbackgroundSpecsFile=background.txt

Running Combinations. Options in the union of those from combined tech-

niques can be used, e.g., this command runs RPS+RPP using the RPS variant

that instruments neither classes that are not impacted nor 3rd-party libraries,

while also demoting speciﬁcations during RPP:

1 $ mvn emop:rps−rpp −DincludeLibraries=false −DincludeNonAﬀected=false −DdemoteCritical=true

eMOP: A Maven Plugin for Evolution-Aware Runtime Veriﬁcation 7

4 Evaluation

Setup. We evaluate eMOP on 21 Maven projects from our previous and ongoing

work on RV and regression testing. They are all single-module Maven projects

(RPP does not yet support multi-module projects). We use between 11 and 50

versions per project, for a total of 676 versions. Table 2 shows project names

(click or hover to see GitHub URL), number of versions that we evaluate (sha#),

sizes (KLOC), number of test classes in the ﬁrst version (TC), average test time

(test[s]), and average JavaMOP overhead for these versions (mop).

Table 2: Projects that we evaluate.

Project sha# KLOC TC test[s] mop

jgroups-aws 11 0.3 1 3.3 12.5

Yank 17 0.6 18 3.1 7.2

java-conﬁguration-impl 24 1.7 5 3.9 7.0

embedded-jmxtrans 50 3.1 15 4.1 12.3

jbehave-junit-runner 50 1.5 17 4.7 15.2

compile-testing 50 7.0 25 6.2 6.0

javapoet 20 8.1 18 9.1 6.6

exp4j 50 3.5 5 9.3 3.8

joda-time 50 93.3 158 10.8 3.9

jnr-posix 50 11.8 24 11.9 11.9

imglib2 20 33.0 81 13.0 2.5

HTTP-Proxy-Servlet 50 1.0 1 13.3 3.2

smartsheet-java-sdk 21 7.9 51 13.5 4.5

zt-exec 15 2.6 24 14.6 1.9

commons-imaging 20 44.5 107 18.4 4.9

jscep 50 3.2 50 18.9 3.7

commons-lang 20 80.7 172 21.5 3.1

datasketches-java 48 41.9 178 35.8 2.3

commons-dbcp 20 11.4 43 53.0 1.8

stream-lib 20 4.7 28 91.0 6.9

commons-io 20 32.7 114 102.1 4.1

We use the same versions for

projects that we evaluated in our

original paper [37]. To choose ver-

sions for the other projects, we it-

erate over the 500 most recent ver-

sions in each project (most recent

ﬁrst) and terminate when we have

tried all 500 or found 50 versions

that change at least one Java ﬁle,

compile, tests pass, JavaMOP does

not fail, and JavaMOP time is at

least 20 seconds. The versions that

we evaluate per project are in our

artifact repository [11].

For RPS, we measure the time

and the number of unique viola-

tions per variant. For VMS, we

measure the number of new and to-

tal violations per version. For RPP, we measure the critical and background

phase times. All overheads for RPS are computed from end-to-end times includ-

ing time for compilation, analysis, running tests, and monitoring. RPP overhead

is only for the critical phase. We run eMOP on 193 revisions (7 no longer com-

pile) of 10 projects from our original paper [37], using the same experimental

settings as before. We also run all eMOP variants on all 21 projects using Ama-

zon EC2 C5.4xlarge instances.

Results: comparing with prior evaluation. Solid bars in Figure 2 show

average overheads of JavaMOP and RPS variants for the 10 projects in our

prior evaluation [37]. RV overhead is t

test

; t

and t

test

are times with and

without JavaMOP, respectively. §2 describes ps

, ps

, and ps

; “ℓ” and “c” mean

3rd-party libraries and non-impacted classes, respectively, are not instrumented.

In Figure 2, all RPS variants reduce the average JavaMOP overhead, which

is 7.2× when our evolution-aware techniques are not applied. As expected, based

on how we designed these variants, ps

incurs the most RPS overhead (5.7×),

while ps

cℓ

incurs the least overhead (1.8×).

In general, excluding 3rd-party libraries has a signiﬁcant eﬀect on reducing

RPS overhead, as seen for example in the diﬀerence between ps

(5.7×) and ps

ℓ

(3.6×) in Figure 2. For libraries that do not depend on 3rd-party libraries, or

8 Yorihiro et al.

Fig. 2: Runtime overheads of, and violations from, JavaMOP and RPS variants

in eMOP for projects and versions in our original evolution-aware RV paper [37].

those that depend on libraries that do not use API methods related to monitored

speciﬁcations, library exclusion has negligible eﬀect.

Striped bars in Figure 2 show average numbers of unique violations per ver-

sion across these projects. RPS reduces the number of violations reported, but in

the best case (ps

cℓ

) it still shows an average of 23.4 violations—most of which are

old—after every code change. So, a technique like VMS is needed that reports

only the few new violations. The overheads and violations in Figures 2 follow

the same trends across RPS variants as in our original paper [37]. So, we are

more conﬁdent in how we implemented evolution-aware RV in eMOP. Tooling

and Maven overheads likely explain any diﬀerences with our old results.

Results on more subjects and versions than prior work. We discuss

details about how eMOP performs when all three evolution-aware RV techniques

are combined, and then discuss the contribution of each technique.

Striped bars in Figure 3 show average overheads per project when all RPS

variants are combined with RPP and VMS. The projects in Figure 3 include some

that were not in Figure 2, and some in Figure 2 are not in the ﬁgure because of

dependency issues when we moved experiments to the cloud. We show the RPS

variants in Figure 3 in decreasing order of average overheads.

In terms of total time incurred (not shown in Figure 3), the best-performing

variant in RPS+RPP+VMS is ps

cℓ

(it does not instrument 3rd-party libraries

or classes that are not impacted by changes). Comparing the striped mop and

cℓ

bars shows that, on average across these projects and versions, ps

cℓ

reduces

RV overhead by roughly 4.0× (a very rough estimate, since we take a mean of

means). The project with the biggest speedup saw a 8.4× reduction in overhead,

from 15.2× with JavaMOP to 1.8× with ps

cℓ

RPP Contributions to the Combination. Solid bars in Figure 3 show how

well RPS performs on its own. Comparing the mop and ps

cℓ

bars shows that RPS

reduces RV overhead by 2.4× when used alone. We elide per-project details for

eMOP: A Maven Plugin for Evolution-Aware Runtime Veriﬁcation 9

Fig. 3: Average runtime overheads of all RPS variants in eMOP when run alone,

and when each RPS variant is combined with RPP and VMS.

lack of space, but we discuss two observations. First, the project that beneﬁts the

most from using only RPS had a 4.9× overhead reduction (from 6.9× to 1.4×).

Table 3: Unique violations from JavaMOP

(not evolution-aware), and VMS.

Project sha# mop VMS mop VMS

(sum) (sum) (avg) (avg)

jgroups-aws 11 12 12 1.1 1.1

Yank 17 22 8 1.3 0.5

java-conﬁguration 24 68 8 2.8 0.3

embedded-jmxtrans 50 627 19 12.5 0.4

jbehave-junit-runner 50 248 9 5.0 0.1

compile-testing 50 541 48 10.8 1

javapoet 20 180 9 9 0.5

exp4j 50 0 0 0 0

joda-time 50 1100 43 22 0.9

jnr-posix 50 1659 72 33.2 1.4

imglib2 20 1540 78 77 3.9

HTTP-Proxy-Servlet 50 929 66 11.6 1.3

smartsheet-java-sdk 21 598 300 28.5 14.3

zt-exec 15 15 2 1 0.1

commons-imaging 20 1064 57 53.2 2.9

jscep 50 1706 213 34.1 4.3

commons-lang 20 1220 61 61.0 3.0

datasketches-java 48 96 34 2 0.7

commons-dbcp 20 40 2 2 0.1

stream-lib 20 300 16 15 0.8

commons-io 20 1795 94 89.8 4.7

Across projects 676 13760 1151 20.2 1.7

Second, the further reduction

of average RV overhead resulting

from combining RPP with RPS

can be seen by comparing the

solid and striped bars in Figure 3.

Doing so shows that RPP’s crit-

ical phase, plus RPS incurs less

overhead than using RPS alone.

VMS Results. Table 3 shows

VMS results; “sha#” is the num-

ber of versions that we evalu-

ate per project, the “mop” and

“VMS” columns show the sum

and average of violations found

per version. Recall that VMS

does not reduce RV runtime over-

head; rather, it aims to show only

new violations. We ﬁnd that us-

ing VMS shows much fewer vi-

olations than RPS or JavaMOP.

Speciﬁcally, across all evaluated projects (see “Across projects” row), VMS shows

only 1.7 violations per version, compared to 20.2 violations with JavaMOP—

an average reduction of 11.8×. The project with the most average reduction—

31.3×—is embedded-jmxtrans. Fractional values that are less than 1.0 in the per

project average rows show the number of violations shown every ten versions, on

10 Yorihiro et al.

average. For example, in jbehave-junit-runner, VMS shows an average of one

violation in every ten versions, but JavaMOP shows ﬁve violations per version.

Our manual analysis shows that all RPS variants are safe—they do not miss

any new violation that VMS reports. (Like in our original paper, we assume that

VMS reports all new violations.) These results on VMS and safety are in line

with ﬁndings from our original paper. So, users will likely feel less swamped by

a deluge of violations that RV shows if run from scratch after every change.

Limitations. eMOP only supports JUnit; it does not yet work for other testing

frameworks, e.g., TestNG. eMOP’s bytecode instrumentation sometimes clashes

with the instrumentation that open-source projects already use for non-RV rea-

sons. Non-trivial engineering is needed to make instrumentation compatible. We

evaluated eMOP on 161 Java API speciﬁcations that are commonly used in

RV research. As more speciﬁcations are added, more optimizations will likely be

needed. eMOP uses JGit to map lines from old to new versions, so a few old vio-

lations can still be presented as new. More precise change-impact analyses, such

as semantic diﬀerencing [20] can be investigated and added as an option in the

future. eMOP’s use of a static change-impact analysis leads to two limitations.

First, eMOP may be unsafe if it does not ﬁnd classes that are impacted by the

changes due to the use of dynamic features like reﬂection. Second, it is possible

that the set of impacted classes would be more precise if analysis is done at the

method-level instead. eMOP may not work as-is for other kinds of speciﬁcations

than the kinds of API-level speciﬁcations that we check. Lastly, eMOP does not

yet control for test ﬂakiness [5, 42, 45, 47] or non-determinism.

Related Work. Researchers proposed many other RV tools other than Java-

MOP, e.g., [10, 22–25, 43]. eMOP is the ﬁrst to integrate evolution-aware RV

techniques into a popular build system. Evolution-awareness is not unique to

JavaMOP; future work can make other tools evolution aware. Tools for oﬄine

RV exist, e.g., [4]. It is not yet clear how to make oﬄine RV evolution aware. Plu-

gins helped make non-RV techniques easier to use. For example, Evosuite [15,16]

is a test generation technique that seemed to gain more popularity after plug-

ins for Maven, Eclipse, and IntelliJ were developed [2]. Also, after decades of

research on regression test selection (RTS) [6,18,19,33,39,46], RTS plugins that

are integrated with Maven or Ant [17, 36] led to recent adoption of RTS tools

among developers and renaissance in RTS research.

5 Conclusions and Future Work

eMOP brings the beneﬁts of evolution-aware RV to Maven. We ﬁnd that eMOP

reduces RV costs and makes it easier to use RV during regression testing. We

plan to evaluate eMOP on more projects, address some of its limitations, and

implement more features. eMOP is open-sourced; we hope that it will provide

a platform for advancing the research on integrating software testing and RV.

Acknowledgements. We thank the anonymous reviewers for their comments on

an earlier draft of this paper. This work was partially supported by funds from

the Google Cyber NYC Institutional Research Program and the US National

Science Foundation under Grant Nos. 2019277 and 2045596.

eMOP: A Maven Plugin for Evolution-Aware Runtime Veriﬁcation 11

References

1. ajc. https://www.eclipse.org/aspectj/doc/next/devguide/ajc-ref.html.

2. A. Arcuri, J. Campos, and G. Fraser. Unit test generation during software develop-

ment: Evosuite plugins for Maven, IntelliJ, and Jenkins. In ICST, pages 401–408,

2016.

3. E. Bartocci, Y. Falcone, A. Francalanza, and G. Reger. Introduction to runtime

veriﬁcation. In Lectures on Runtime Veriﬁcation: Introductory and Advanced Top-

ics, pages 1–33. 2018.

4. D. Basin, M. Harvan, F. Klaedtke, and E. Zălinescu. MONPOLY: Monitoring

usage-control policies. In RV, pages 360–364, 2012.

5. J. Bell, O. Legunsen, M. Hilton, L. Eloussi, T. Yung, and D. Marinov. DeFlaker:

Automatically detecting ﬂaky tests. In ICSE, pages 433–444, 2018.

6. S. Biswas, R. Mall, M. Satpathy, and S. Sukumaran. Regression test selection

techniques: A survey. Informatica, 35(3):289–321, 2011.

7. Collections_SynchronizedCollection Speciﬁcation. https://github.com/

owolabileg/property-db/blob/master/annotated-java-api/java/util/

Collections_SynchronizedCollection.mop.

8. SuiteHTMLReporter does not synchronize iteration on a synchronized list. https:

//github.com/testng-team/testng/pull/931.

9. JUnitXMLReporter does not synchronize the two synchronized collections when

iterating. https://github.com/testng-team/testng/pull/830.

10. J. Ellul and G. J. Pace. Runtime veriﬁcation of ethereum smart contracts. In

EDCC, pages 158–163, 2018.

11. eMOP Artifacts. https://github.com/SoftEngResearch/emop-artifacts.

12. eMOP GitHub Page. https://github.com/SoftEngResearch/emop.

13. Y. Falcone, K. Havelund, and G. Reger. A tutorial on runtime veriﬁcation. In

EDSS, pages 141–175. 2013.

14. Y. Falcone, S. Krstić, G. Reger, and D. Traytel. A taxonomy for classifying runtime

veriﬁcation tools. In Runtime Veriﬁcation, pages 241–262, 2018.

15. G. Fraser and A. Arcuri. Evosuite: Automatic test suite generation for object-

oriented software. In FSE, pages 416–419, 2011.

16. G. Fraser and A. Arcuri. A large-scale evaluation of automated unit test generation

using Evosuite. TOSEM, 24(2):1–42, 2014.

17. M. Gligoric, L. Eloussi, and D. Marinov. Ekstazi: Lightweight test selection. In

ICSE Demo, pages 713–716, 2015.

18. M. Gligoric, L. Eloussi, and D. Marinov. Practical regression test selection with

dynamic ﬁle dependencies. In ISSTA, pages 211–222, 2015.

19. T. L. Graves, M. J. Harrold, J.-M. Kim, A. Porter, and G. Rothermel. An empirical

study of regression test selection techniques. TOSEM, 10(2):184–208, 2001.

20. A. Gyori, S. K. Lahiri, and N. Partush. Reﬁning interprocedural change-impact

analysis using equivalence relations. In ISSTA, pages 318–328, 2017.

21. A. Gyori, B. Lambeth, A. Shi, O. Legunsen, and D. Marinov. NonDex: a tool for

detecting and debugging wrong assumptions on Java API speciﬁcations. In FSE

Demo, pages 993–997, 2016.

22. S. Hallé and R. Khoury. Event stream processing with BeepBeep 3. In RV-CuBES,

pages 81–88, 2017.

23. K. Havelund. Rule-based runtime veriﬁcation revisited. STTT, 17:143–170, 2015.

24. K. Havelund and D. Peled. Eﬃcient runtime veriﬁcation of ﬁrst-order temporal

properties. In SPIN, pages 26–47, 2018.

12 Yorihiro et al.

25. K. Havelund, D. Peled, and D. Ulus. First-order temporal logic monitoring with

BDDs. FMSD, 56(1-3):1–21, 2020.

26. K. Havelund and G. Roşu. Monitoring programs using rewriting. In ASE, pages

135–143, 2001.

27. java.util.Collections. https://docs.oracle.com/javase/8/docs/api/java/util/

Collections.html.

28. JGit. http://www.eclipse.org/jgit.

29. D. Jin, P. O. Meredith, C. Lee, and G. Roşu. JavaMOP: Eﬃcient parametric

runtime monitoring framework. In ICSE Demo, pages 1427–1430, 2012.

30. M. Kim, S. Kannan, I. Lee, O. Sokolsky, and M. Viswanathan. Java-MaC: A

run-time assurance tool for Java programs. In RV, pages 218–235, 2001.

31. C. Lee, D. Jin, P. O. Meredith, and G. Roşu. Towards categorizing and formalizing

the JDK API. Technical report, Computer Science Dept., UIUC, 2012.

32. O. Legunsen, N. Al Awar, X. Xu, W. U. Hassan, G. Roşu, and D. Marinov. How

eﬀective are existing Java API speciﬁcations for ﬁnding bugs during runtime veri-

ﬁcation? ASEJ, 26(4):795–837, 2019.

33. O. Legunsen, F. Hariri, A. Shi, Y. Lu, L. Zhang, and D. Marinov. An extensive

study of static regression test selection in modern software evolution. In FSE,

pages 583–594, 2016.

34. O. Legunsen, W. U. Hassan, X. Xu, G. Roşu, and D. Marinov. How good are the

specs? A study of the bug-ﬁnding eﬀectiveness of existing Java API speciﬁcations.

In ASE, pages 602–613, 2016.

35. O. Legunsen, D. Marinov, and G. Rosu. Evolution-aware monitoring-oriented pro-

gramming. In ICSE NIER, pages 615–618, 2015.

36. O. Legunsen, A. Shi, and D. Marinov. STARTS: STAtic Regression Test Selection.

In ASE Demo, pages 949–954, 2017.

37. O. Legunsen, Y. Zhang, M. Hadzi-Tanovic, G. Rosu, and D. Marinov. Techniques

for evolution-aware runtime veriﬁcation. In ICST, pages 300–311, 2019.

38. M. Leucker and C. Schallhart. A brief account of runtime veriﬁcation. In Formal

Languages and Analysis of Contract-Oriented Software, pages 293–303, 2007.

39. Y. Liu, J. Zhang, P. Nie, M. Gligoric, and O. Legunsen. More precise regression

test selection via reasoning about semantics-modifying changes. In ISSTA, pages

664–676, 2023.

40. B. Miranda, I. Lima, O. Legunsen, and M. d’Amorim. Prioritizing runtime veriﬁ-

cation violations. In ICST, pages 297–308, 2020.

41. P. W. O’Hearn. Continuous reasoning: Scaling the impact of formal methods. In

LICS, pages 13–25, 2018.

42. F. Palomba and A. Zaidman. Does refactoring of test smells induce ﬁxing ﬂaky

tests? In ICSME, pages 1–12, 2017.

43. G. Reger, H. C. Cruz, and D. Rydeheard. MarQ: monitoring at runtime with QEA.

In TACAS, pages 596–610, 2015.

44. F. B. Schneider. Enforceable security policies. TISSEC, 3(1):30–50, 2000.

45. A. Shi, A. Gyori, O. Legunsen, and D. Marinov. Detecting assumptions on de-

terministic implementations of non-deterministic speciﬁcations. In ICST, pages

80–90, 2016.

46. A. Shi, M. Hadzi-Tanovic, L. Zhang, D. Marinov, and O. Legunsen. Reﬂection-

aware static regression test selection. PACML, 3(OOPSLA):1–29, 2019.

47. A. Shi, W. Lam, R. Oei, T. Xie, and D. Marinov. iFixFlakies: A framework for

automatically ﬁxing order-dependent ﬂaky tests. In FSE, pages 545–555, 2019.

48. About sureﬁre. https://maven.apache.org/surefire.