what i learned from rascal

Oscar NierstraszSoftware Composition Group

scg.unibe.ch

What I learned from Rascal

SWAT talk, Oct 25 2013SCG talk, Oct 29 2013

Monday, October 28, 13

Roadmapmodule basic::helloimport IO;void hello () {! println("hello");}

First Steps

M3 to MSEWarming up

The Good, the Bad ...Syntax, Islands

and WaterMonday, October 28, 13

First Steps


4

What’s Rascal?

Rascal is a purely functional metaprogramming language for

source code analysisRascal is a purely functional, model transformation language and language workbench.


5

Free download; online help; stackoverflow; github issue tracking ...


6

module basic::helloimport IO;void hello () {! println("hello");}

modules

types

functions

REPL

Eclipse integration

Rascal is a purely functional language with java-ish syntax.You can run it from a shell, but it is really meant to be used within eclipse.


7

Don’t read this first!

Read these recipes

Then the manual!

Missing a gentle introduction

The examples (“recipes”) are nice, but there should be more of them, and they should be introduced step-by-step as exercises.


Warming up


9

Polymorphism detection

First test case for Rascal:

Can polymorphic call sites be more effectively detected statically (false positives) or dynamically (false negatives)?

1. Implement static heuristics based on M32. ...

3. ...

This first project was designed to see how well Rascal and M3 are suited to static and dynamic analysis. Since M3 does not go below the method level, only some simple heuristics were implemented.


10

What’s M3?

M3 is a language-independent meta-model for software analysis Like FAMIX ..

module analysis::m3::Core...

data M3 = m3(loc id); anno rel[loc name, loc src] M3@declarations; // maps declarations to where they are declared. contains any kind of data or type or code declaration (classes, fields, methods, variables, etc. etc.)anno rel[loc name, TypeSymbol typ] M3@types; // assigns types to declared source code artifactsanno rel[loc src, loc name] M3@uses; // maps source locations of usages to the respective declarationsanno rel[loc from, loc to] M3@containment; // what is logically contained in what else (not necessarily physically, but usually also)anno list[Message messages] M3@messages; // error messages and warnings produced while constructing a single m3 modelanno rel[str simpleName, loc qualifiedName] M3@names; // convenience mapping from logical names to end-user readable (GUI) names, and vice versaanno rel[loc definition, loc comments] M3@documentation; // comments and javadoc attached to declared thingsanno rel[loc definition, Modifier modifier] M3@modifiers; // modifiers associated with declared things

...

Each annotation is a list or a relation ...


11

M3 for Java

module lang::java::m3::Core...extend analysis::m3::Core;

...

anno rel[loc from, loc to] M3@extends; // classes extending classes and interfaces extending interfacesanno rel[loc from, loc to] M3@implements; // classes implementing interfacesanno rel[loc from, loc to] M3@methodInvocation; // methods calling each other (including constructors)anno rel[loc from, loc to] M3@fieldAccess; // code using data (like fields)anno rel[loc from, loc to] M3@typeDependency; // using a type literal in some code (types of variables, annotations)anno rel[loc from, loc to] M3@methodOverrides; // which method override which other methods

...

extends M3 with Java specifics


12

m3(|project://p2-SnakesAndLadders|)[ @fieldAccess={ <|java+method://p2-SnakesAndLadders/snakes/Game/toString()|,|java+field://p2-SnakesAndLadders/snakes/Game/squares|>, <|java+method://p2-SnakesAndLadders/snakes/Game/addSquares(int)|,|java+field://p2-SnakesAndLadders/snakes/Game/squares|>, ... }, @extends={ <|java+class://p2-SnakesAndLadders/snakes/Snake|,|java+class://p2-SnakesAndLadders/snakes/Ladder|>, <|java+class://p2-SnakesAndLadders/snakes/Ladder|,|java+class://p2-SnakesAndLadders/snakes/Square|>, ... }, @methodInvocation={ <|java+method://p2-SnakesAndLadders/snakes/SimpleGameTest/move8jillWins(snakes.Game)|,|java+method://p2-SnakesAndLadders/snakes/Game/winner()|>, <|java+variable://p2-SnakesAndLadders/snakes/Game/toString()/buffer|,|java+constructor://p2-SnakesAndLadders/java/lang/StringBuffer/StringBuffer()|>, ... }, @typeDependency={ <|java+method://p2-SnakesAndLadders/snakes/Square/nextSquare()|,|java+primitiveType://p2-SnakesAndLadders/int|>, <|java+method://p2-SnakesAndLadders/snakes/DieTest/testMinReached()|,|java+class://p2-SnakesAndLadders/snakes/DieTest|>, ...

Rascal talks to Eclipse to build an M3 modelM3 snakesM3 = createM3FromEclipseProject(|project://p2-SnakesAndLadders|);

URIs (“locations”)


13

Most M3 queries are 1-liners

@doc { Return the set of (all) subtypes of a type. }@memo public set[loc] subtypes(M3 m, loc aType) =invert(getDeclaredTypeHierarchy(m)+)[aType];

Memoized function

Transitive closureInvert relation


14

@doc { Returns the source URI for the method URI. }public loc getSource(M3 m, loc method) =! getUniqueElement(m@declarations[method]);

@doc { Returns unique element of a set, or fails. }private &T getUniqueElement(set[&T] s) {! assert size(s) == 1;! return getOneFrom(s);}

Many M3 relations are actually (partial functions)

Note the use of generics and assertions.


15

@doc { Return classes with subclasses and interfaces with >1 implementations. } public set[loc] polymorphTypes(M3 m) {! set[loc] types = getDeclaredTypeHierarchy(m)<0>;! return { t | t <- types, (isClass(t) && size(subtypes(m,t)) > 0)! ! ! ! ! ! ! || (isInterface(t) && size(subtypes(m,t)) > 1) };}

@doc { Returns the type symbol for a given class loc. }public TypeSymbol getTypeSymbol(M3 m, loc t) = getUniqueElement(m@types[t]);

@doc { Return fields declared to be of polymorphic types. }public set[loc] polymorphFields(M3 m) {! set[TypeSymbol] ts = { getTypeSymbol(m,t) | t <- polymorphTypes(m) };! return { t | t <- invert(m@types)[ts], isField(t) };}

Polymorphic candidates

Some heuristics are easy to express; others would require access to the AST ...

private loc squareField = |java+field://p2-SnakesAndLadders/snakes/Player/square|;test bool testPolymorphFields() = polymorphFields(snakes()) == { squareField };


M3 to MSE

https://github.com/onierstrasz/rascal-m3-to-mse.git




(! (FAMIX.Namespace (id: 182)! ! (name 'snakes'))! (FAMIX.Class (id: 8)! ! (name 'DieTest')...

m3(|project://...|)[ @fieldAccess={ ... },...

17

Idea: steal models from Rascal

Especially interesting for languages other than Java!


18

Need IDs for all FAMIX entities

@doc { Returns a map from FAMIX entity values to unique IDs. }@memo public map[value,int] idMap(M3 m) {! set[value] entities = m@declarations<0> + m@declarations<1> // classes, methods, ...! ! ! ! ! ! + primitiveTypes(m) // NB: TypeSymbols; all others are locations! ! ! ! ! ! + importedTypes(m)! ! ! ! ! ! + { unknownFieldType() }! ! ! ! ! ! + m@methodInvocation<1> // external methods! ! ! ! ! ! + m@fieldAccess<1> // external fields! ! ! ! ! ! + m@extends + m@implements // inheritances! ! ! ! ! ! + m@methodInvocation! ! ! ! ! ! + m@fieldAccess! ! ! ! ! ! + importedPackages(m);! return index(entities);}

NB: locations or TypeSymbols entities we might need...

handy library function!

It was an iterative process to figure out what entities were needed ...


19

Short cut: directly spit out MSE@doc { Write the MSE for an Eclipse Java project to its source directory. }public void writeMSE(M3 m) {! loc file = m.id + "<m.id.authority>.mse";! writeFile(file, "(\n");! appendPackages(file, m);! appendClasses(file, m);...! appendToFile(file, ")\n");}

private void appendClasses(loc file, M3 m) {! for (loc c <- classes(m) + interfaces(m) + anonClasses(m)) {! ! appendToFile(file,! ! ! "! (FAMIX.Class (id: <getID(m,c)>)! ! ! '! ! (name \'<getClassName(c)>\')! ! ! '! ! (container (ref: <getID(m, getClassPackage(m, c))>))! ! ! '! ! (isInterface <isInterface(c)?true:false>))! ! ! '");! ! // TODO: modifiers, sourceAnchor ...! }}

Easy, but complicates debuggingRascal string templates

Directly spitting out MSE means there is no way to check the consistency of the output


20

Debugging MSE is painful

Moose did not make it easy to track down errors in the generated MSE.A script to post-check for dangling references helped.


21

@doc { Return the package URI for a given class URI. }public loc getClassPackage(M3 m, loc c) {! set[loc] parents = parents(m)[c]?{};! if (isEmpty(parents)) {! ! return unknownPackage(c);! }! loc parent = getUniqueElement(parents);! return isPackage(parent) ? parent : getClassPackage(m, parent);}

Missing abstractions were easy to build

One of few functions that weren’t 1-liners.


22

private bool isPrimitive(\int()) = true;private bool isPrimitive(\float()) = true;private bool isPrimitive(\double()) = true;private bool isPrimitive(\short()) = true;private bool isPrimitive(\boolean()) = true;private bool isPrimitive(\char()) = true;private bool isPrimitive(\byte()) = true;private bool isPrimitive(\long()) = true;private bool isPrimitive(\void()) = true;private bool isPrimitive(\null()) = true;private bool isPrimitive(\array(_,_)) = true;private bool isPrimitive(\typeParameter(_, _)) = true; private default bool isPrimitive(TypeSymbol s) = false;

public set[TypeSymbol] primitiveTypes(M3 m) =! { t | t <- types(m), isPrimitive(t) }! + { t | t <- returnTypes(m), isPrimitive(t) }! + { t | t <- parameterTypes(m), isPrimitive(t) };

@doc { Return the ID of the type of a field. }public int declaredTypeID(M3 m, loc f) {! try! ! TypeSymbol ts = declaredTS(m, f);! catch :! ! return getID(m, unknownFieldType());! if (isPrimitive(ts)) {! ! return getID(m, ts);! }! return getID(m, location(ts));}


23

public set[loc] locations(class(loc decl, _)) = { decl };public set[loc] locations(interface(loc decl, _)) = { decl };public set[loc] locations(method(loc decl, _, _, _)) = { decl };public set[loc] locations(constructor(loc decl, _)) = { decl };public set[loc] locations(enum(loc decl)) = { decl };public set[loc] locations(typeParameter(decl,_)) = { decl };public set[loc] locations(object()) = { unknownFieldType() }; // TEMPORARY HACKpublic default set[loc] locations(TypeSymbol _) = {};test bool testLocations1() = locations(\int()) == {};test bool testLocations2() = locations(playerTS) == {playerClass};test bool testLocations3() = locations(setSquareTS) == {setSquare};

public loc location(TypeSymbol ts) {! assert(!isPrimitive(ts));! return getUniqueElement(locations(ts));}

@doc { Returns the locations of a set of TypeSymbols. }public set[loc] locationsOf(set[TypeSymbol] tsSet) {! return { location(ts) | ts <- tsSet, !isPrimitive(ts) };}


24

@doc { Returns imported classes and interfaces, i.e., used, but not declared. }public set[loc] importedTypes(M3 m) =! usedTypes(m)! + superTypes(m)! + locationsOf(returnTypes(m))! + locationsOf(parameterTypes(m))! - m@declarations<0>;public set[loc] superTypes(M3 m) = m@extends<1> + m@implements<1>;public set[loc] usedTypes(M3 m) =! ! { decl | \class(decl, _) <- types(m)}! ! + { decl | \interface(decl, _) <- types(m)};

@doc { Return the return type of a method. If primitive, returns the TypeSymbol. }public value returnType(M3 m, loc meth) {! try {! ! TypeSymbol ts = ! ! ! getUniqueElement({rt | \method(_, _, TypeSymbol rt, _) <- typeOf(m)[meth]});! ! return isPrimitive(ts) ? ts : location(ts);! }! catch :! ! return unknownFieldType();}


25

:set profiling true@doc { Return map of declarations; memoized for performance. }@memo private map[loc, set[loc]] sourceLocMap(M3 m) = toMap(m@declarations);

@doc { Memoize conversion to map for performance.}@memo public map[loc,set[loc]] parents(M3 m) = toMap(invert(m@containment));

@doc { Return type(s) of an entity; memoized map for performance. }@memo private map[loc,set[TypeSymbol]] typeOf(M3 m) = toMap(m@types);

@memo public set[loc] externalFields(M3 m) = m@fieldAccess<1> - fields(m);public bool isExternalField(M3 m, loc f) = f in externalFields(m);

@memo public set[loc] externalMethods(M3 m) = m@methodInvocation<1> - methods(m);public bool isExternalMethod(M3 m, loc meth) = meth in externalMethods(m);

rascal>:set profiling trueok

rascal>writeMSE(sm);PROFILE: 124 data points, 472 ticks, tick = 1 milliSecs Source File Ticks % Source rascal://Set 40 8.5% |rascal://Set|(4059,1,<148,18>,<148,19>) rascal://Set 36 7.6% |rascal://Set|(4051,5,<148,10>,<148,15>)rascal://lang::java::m3::Core 29 6.1% |rascal://lang::java::m3::Core|(2758,69,<74,28>,<74,97>) rascal://m3::M3toMSE 25 5.3% |rascal://m3::M3toMSE|(10252,1,<342,54>,<342,55>)rascal://lang::java::m3::Core 21 4.4% |rascal://lang::java::m3::Core|(2925,32,<76,30>,<76,62>) rascal://Relation 19 4.0% |rascal://Relation|(10157,15,<457,2>,<457,17>) rascal://Relation 18 3.8% |rascal://Relation|(10164,8,<457,9>,<457,17>)

Memoizing selected functions and converting certain relations to maps improved performance 60x! (20 sec vs 20 min for one run).


26

Rascal in Moose


Syntax, Islands,and Water

https://github.com/onierstrasz/rascal-islands.git




28

Easy peasy MSE parserstart syntax Famix = "(" Entity* ")" ;

syntax Entity = "(" EntityName EntityID Attribute* ")" ;

syntax Attribute = "(" AttributeName Value+ ")" ;

syntax Value = String | Boolean | Number | EntityRef ;

...

Nothing tricky here


29

Idea: island parser for structure

Idea proposed by Patrick Viry

Several false starts(grammar errors, ambiguity)

Start with a flat island parser

start syntax Code = Stuff+ ;

syntax Stuff = String | Char | Comment | Word | Noise | Paren // flat, no structure ;

The idea is to ignore everything except parentheses and curly braces to infer as much as possible about the structure of an unknown language.


30

start syntax Code = code: Stuff* ;

syntax Stuff = Water | Island ;

syntax Water = String | Char | Comment | Noise ;

syntax Island = Word | Struct ;

syntax Struct = round: "(" Code ")" | curly: "{" Code "}" | square: "[" Code "]" ;

Then “graduate” to a structured island parser

Since getting the syntactic elements right is hard, start with a flat parser and then introduce the structure.


31

Use toy and real code to debug


32

IList r = readLines(a, "`", "\"", "\"", "\\\\\"", "<", "\\\\<", ">", "\\\\>");

|project://rascal-clone/src/org/rascalmpl/library/util/SystemAPI.java|

Homing in on syntax errors@doc { Binary search to find smallest sublist of lines giving a parse error. }private list[str] binSearchErrs(type[&T<:Tree] begin, list[str] input) {! ...! assert(low+high == input);! try! ! parse(begin, intercalate("\n",low));! catch :! ! return binSearchErrs(begin, low);! try! ! parse(begin, intercalate("\n",high));! catch :! ! return binSearchErrs(begin, high);! return input; // failed to find a substring with the error}

Parse errors often did not give enough context. By automating a binary search, minimal examples could be found and turned into tests cases.(This worked for the flat parser.)


33

Ambiguities hard to fix ...rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Player.java|);bool: true

rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Game.java|);bool: true

rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Ladder.java|);bool: true

rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/SimpleGameTest.java|);bool: true

rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Die.java|);bool: true

rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Square.java|);bool: true

rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/DieTest.java|);bool: true

lexical Comment! = "/*" (![*] | [*] !>> [/])* "*/" ! | "//" ![\n]*! ;

lexical Word = word: [a-zA-Z_][a-zA-Z0-9_\-]* !>> [a-zA-Z0-9_\-] ;

syntax Noise = NoiseChar+ ;

lexical NoiseChar = ![a-zA-Z_(){}\[\]\"\'] | "/" !>> [*/] ;

Somewhere here some ambiguity lurks, but it is hard to track down with the current tools ...


34

Visualization identified test cases

The renderParsetree() library function helped to home in on the problematic cases.


35

Water is hard!

lexical Noise // numbers and operators = (![a-zA-Z_(){}\[\]\"\'/])+ !>> ![a-zA-Z_(){}\[\]\"\'/] | "/" !>> [*/] ;

This worked, but it took a lot of effort to come up with this rule.(One fatal error was that Noise was declared as syntax rather than lexical.)


36

Contextual analysis ... ?public void countWords(loc project) {! list[str] allWords =! ! ( [] | it + words(parse(#start[Code], src).top) | src <- toList(javaFiles(project)) );! for (<n,k> <- sort(countStrings(allWords)))! ! println(<n,k>);}

rascal>countWords(|project://p2-SnakesAndLadders|);<1,"(JExample)"><1,"(class)"><1,"Die"><1,"DieTest"><1,"FirstSquare">...<25,"{(int)}"><32,"{{this}}"><33,"{{(game)}}"><35,"{{game}}"><37,"{{(position)}}"><38,"{{assertEquals}}"><52,"{{return}}"><68,"{public}">ok


The Good, the Bad ...


38

Debugging grammars

Debugging failed tests

Misplaced syntax errors


39

Tutor needs work!

Oh no, no OO!

name '...'modifiers '…' '...'

Class

startLine NendLine NfileName '...'

FileAnchor

sourceAnchor

element

signature '…'Invocation

cyclomaticComplexity Nkind '…'numberOfStatements Nsignature '…'

Method

candidates

previous

sender Parameter

receiver

name '...'isStub BOOL

PrimitiveType

Type

parentType

sourceAnchor

element

declaredType

name '…'TypedEntity

parentBehaviouralEntity

Attribute

modifiers '...'ClassMember

Access

accessor

previous

variable

Inheritanceprevious

superclass

subclass vs.

m3(|project://...|)[ @fieldAccess={...}, @extends={...}, @methodInvocation={...}, @typeDependency={...}, @messages=[...], @containment={...}, @names={...}, @implements={...}, @documentation={...}, @uses={...}, @methodOverrides={...}, @types={...}, @modifiers={...}, @declarations={...}]

The lack of OO caused me some culture shock. I felt that functions that applied to certain data types should have been methods.I also missed an OO layer around the M3 models.


40

Tests

public set[TypeSymbol] parameterTypes(M3 m) = ( {} | it + e | e <- { toSet(pt) | \method(_, _, _, list[TypeSymbol] pt) <- types(m) });

Compact functional style

Locations

rascal>:set profiling trueok

rascal>writeMSE(sm);PROFILE: 124 data points, 472 ticks, tick = 1 milliSecs Source File Ticks % Source rascal://Set 40 8.5% |rascal://Set|(4059,1,<148,18>,<148,19>) rascal://Set 36 7.6% |rascal://Set|(4051,5,<148,10>,<148,15>)...

Profiling

Live and offline help!Thanks!

Libraries

Integrated testing was very handy.The compact functional style led to lots of 1-liners.Profiling made optimization very easy.Locations made code navigation easy.Feedback was quick offline, but getting live help was even better!


what i learned from rascal

Technology