Tracing as a Service

Spencer is an online service that lets you analyse large program traces of programs running on the JVM using custom queries. Queries in spencer are composable: If you have one analysis, you are able to refine it using others.

Queries

Queries in Spencer are expressions that return a set of object IDs. They implement a test: does an object's usage fulfill a certain definition or not?

Spencer distinguishes between primitive queries and composed queries. Primitive queries are the basic building blocks that are implemented "natively" in the backend of the service.

The primitive queries are:

Query Meaning
MutableObj()
All objects that are changed outside their constructor.
ImmutableObj()
All objects that are never changed outside their constructor.
UniqueObj()
All objects that are never aliased.
HeapUniqueObj()
All objects that are never aliased.
TinyObj()
All objects that do not have or do not use reference type fields.
StackBoundObj()
All objects that are never aliased.
AgeOrderedObj()
All objects that are only holding field references to objects created before them.
ReverseAgeOrderedObj()
All objects that are only holding field references to objects created after them.
InstanceOf(java.lang.String)
All objects that are instances of class java.lang.String.
AllocatedAt(String.java:1933)
All objects that were allocated at String.java:1933.
Obj()
All objects that were traced.

Spencer's power lies in the fact that these queries all return the same kind of structure: sets of objects! This restriction makes it possible to compose queries into larger ones, like so:

Query Meaning
And(ImmutableObj() AllocatedAt(String.java:1933))
All objects that are never changed outside their constructor, and were allocated at String.java:1933.
Or(UniqueObj() ImmutableObj())
All objects that are never aliased, or are never changed outside their constructor.
Deeply(Or(UniqueObj() ImmutableObj()))
All objects that are never aliased, or are never changed outside their constructor, and the same is true for all reachable objects.
HeapDeeply(Or(UniqueObj() ImmutableObj()))
All objects that are never aliased, or are never changed outside their constructor, and the same is true for all reachable objects.
ReachableFrom(AllocatedAt(String.java:1933))
All objects that are reachable from objects that were allocated at String.java:1933.
HeapReachableFrom(AllocatedAt(String.java:1933))
All objects that are heap-reachable from objects that were allocated at String.java:1933.
CanReach(AllocatedAt(String.java:1933))
All objects that are able to reach objects that were allocated at String.java:1933.
CanHeapReach(AllocatedAt(String.java:1933))
All objects that are able to heap-reach objects that were allocated at String.java:1933.

Assembling queries from smaller parts is nice — but sometimes, you'll want to compare different queries with each other: how do prevalent are mutability, stationarity, and immutability, compared to each other? Comparing queries with each other can be done by separating several queries with a slash, like so: MutableObj()/StationaryObj()/ImmutableObj().

These combined queries bring up an interactive visualisation, try clicking on the query name labels..

The design of Queries

What's in the data?

Spencer contains many data sets. One data set is a program trace (in preprocessed form) that stems from running and instrumenting a program. The data in the program are distributed over several tables. Although these tables are not accessed directly by a user (they are accessed by the primitive queries instead), a user that wants to help Spencer grow must understand the data that they contain.

The objects table

The objects table contains basic information on every object that was encountered during tracing. But see for yourself, below are a few records from our database. As you can see, we identify each object by its unique numeric ID, an object has an optional allocation site, it was created at a certain event ID, and we also record the last time the object was used (loaded, modified, read from, or called).

You notice that there's one line with a negative ID. Those "objects" are pseudo objects that represent a class. Every access to static fields and methods will be reported as an access to this pseudo object in the rest of the data.

You might also notice that the allocation sites are optional. This is a technical limitation: First, some class files do not contain location information and there is no line we could give. Second, some objects are created very early in the startup process, before the instrumentation is running. Third, some objects are allocated by native code -- and we do not instrument native code.

By the way these tables are interactive. Try hovering your mouse over the event indices. All earlier indices (with a lower number) will become red, all those later (with a higher number) will become green.
Also, you may want to click one of the links next to object IDs or allocation sites. They will bring up the query system. Don't get lost there yet, though.

object id allocation site? event idx of allocation event idx of last usage
8370 ZipCoder.java:89
21727 AbstractStringBuilder.java:137
36810 ZipFile.java:393
36811 ZipFile.java:393
36812 ZipFile.java:393
21735 StringCoding.java:79
21737 Benchmark.java:627
21738 Benchmark.java:619
21747 AbstractStringBuilder.java:137
36814 URLClassLoader.java:462
36815 Resource.java:117
36816 Resource.java:117
21755 StringCoding.java:79
36817 Resource.java:117
21757 Benchmark.java:627
... ... ... ...

The refs table

The refs table contains the "graph structure" of a program trace.

References in this table are either variable- or field-references. References have a caller (the object holding the reference), a callee (the object being referenced, or 0), they have a name (local variable names are not recorded, just numbered like var_{i}).

References also have a time of when they where established (by setting a field or variable, or passing a method argument) and a time when they where deleted (when overwriting a field or variable, for local variables when returning from a method call or for fields when an object is not used any longer.

caller kind name callee start end? thread
6025 holds a var ref called var_1 to 3360 67971 none established by main
6023 holds a var ref called var_1 to 3360 67964 none established by main
6021 holds a var ref called var_1 to 3360 67957 none established by main
6019 holds a var ref called var_1 to 3360 67949 none established by main
6019 holds a var ref called var_1 to 3360 67946 none established by main
5439 holds a field ref called current to 855 298 none established by main
5439 holds a field ref called next to 855 283 none established by main
... ... ... ... ... ... ...

The uses table

Accesses to fields of objects end up in the uses table. They are represented by events of kind read or modify for reads or writes of primitive type fields and fieldload or fieldstore for reads or writes of reference type fields. Reference type variables are treated as if they were fields that are exclusive to a method, they are represented by records of kind varload, or varstore, respectively. Variables of primitive types are not traced.

caller kind name callee idx thread
5439 executes read of field size of 836 166 in main
5439 executes fieldstore of field current of 5439 162 in main
5439 executes fieldstore of field next of 5439 161 in main
5439 executes modify of field expectedModCount of 5439 157 in main
5439 executes read of field modCount of 836 156 in main
5432 executes read of field fieldOffset of 5432 118 in main
5429 executes read of field fieldOffset of 5429 82 in main
5429 executes fieldload of field unsafe of -5425 80 in main
5429 executes modify of field isReadOnly of 5429 71 in main
5429 executes varload of field var_1 of 5429 67 in main
5427 executes read of field fieldOffset of 5427 41 in main
5427 executes varload of field var_1 of 5427 40 in main
5427 executes fieldload of field unsafe of -5425 39 in main
5427 executes varload of field var_1 of 5427 38 in main
5427 executes varstore of field var_1 of 5427 33 in main
5427 executes modify of field isReadOnly of 5427 30 in main
5427 executes varstore of field var_1 of 5427 29 in main
5427 executes varload of field var_1 of 5427 27 in main
5427 executes varload of field var_1 of 5427 26 in main
... ... ... ... ... ...

The calls table

Calls on objects are stored in the calls table. The calls table contains timing and thread information. Since calls are tagged with the event index of their start and end, we know that if two calls in the same thread are nested temporarily, they must also be nested on the call stack. The "apropos" view, accessible from object links exploits this for visulisation.

Hint: try brushing along the event indices in the start column. The pattern in which the events in the end column turn red tells you something about the call stack! Try to figure it out as an exercise :)

caller name callee callsite start end thread
5439 calls <init> on 5439 at HashMap.java:1013 153 191 in main
5439 calls nextNode on 5439 at HashMap.java:1471 197 231 in main
5439 calls nextNode on 5439 at HashMap.java:1471 237 285 in main
5439 calls nextNode on 5439 at HashMap.java:1471 291 341 in main
5445 calls getInt on 5445 at UnsafeQualifiedIntegerFieldAccessorImpl.java:38 430 436 in main
5445 calls getInt on 5445 at UnsafeQualifiedIntegerFieldAccessorImpl.java:38 796 802 in main
5445 calls getInt on 5445 at UnsafeQualifiedIntegerFieldAccessorImpl.java:38 896 902 in main
5445 calls getInt on 5445 at UnsafeQualifiedIntegerFieldAccessorImpl.java:38 996 1002 in main
5445 calls getInt on 5445 at UnsafeQualifiedIntegerFieldAccessorImpl.java:38 1096 1102 in main
5445 calls getInt on 5445 at UnsafeQualifiedIntegerFieldAccessorImpl.java:38 1196 1202 in main
... ... ... ... ... ...

Future Work: The sync table

Event indices are numeric. This wrongly suggests that events are ordered by index, even accross threads. This is wrong because the fact that two parallel events ended up in the data base in one particular order does not imply that there was an actual happens-before relation.

Spencer might, in the future, trace synchronisation points between threads that establishes happens-before accross thread boundaries.

sync1-thread sync1-file sync1-line sync1-event-before sync2-thread sync2-file sync2-line sync2-event-after
... ... ... ... ... ... ... ...