what i learned from rascal

40
Oscar Nierstrasz Software Composition Group scg.unibe.ch What I learned from Rascal SWAT talk, Oct 25 2013 SCG talk, Oct 29 2013 Monday, October 28, 13

Upload: oscar-nierstrasz

Post on 22-May-2015

1.410 views

Category:

Technology


1 download

DESCRIPTION

Rascal is a functional metaprogramming language and system designed for software analysis. M3 is a generic metamodel for representing code parsed in Rascal. In this talk I describe my experience learning Rascal, and using it to query and manipulate M3 models, and to transform them into FAMIX models that can be imported into the Moose analysis platform.

TRANSCRIPT

Page 1: What I learned from Rascal

Oscar NierstraszSoftware Composition Group

scg.unibe.ch

What I learned from Rascal

SWAT talk, Oct 25 2013SCG talk, Oct 29 2013

Monday, October 28, 13

Page 2: What I learned from Rascal

Roadmapmodule basic::helloimport IO;void hello () {! println("hello");}

First Steps

M3 to MSEWarming up

The Good, the Bad ...Syntax, Islands

and WaterMonday, October 28, 13

Page 3: What I learned from Rascal

First Steps

Monday, October 28, 13

Page 4: What I learned from Rascal

4

What’s Rascal?

Rascal is a purely functional metaprogramming language for

source code analysisRascal is a purely functional, model transformation language and language workbench.

Monday, October 28, 13

Page 5: What I learned from Rascal

5

Free download; online help; stackoverflow; github issue tracking ...

Monday, October 28, 13

Page 6: What I learned from Rascal

6

module basic::helloimport IO;void hello () {! println("hello");}

modules

types

functions

REPL

Eclipse integration

Rascal is a purely functional language with java-ish syntax.You can run it from a shell, but it is really meant to be used within eclipse.

Monday, October 28, 13

Page 7: What I learned from Rascal

7

Don’t read this first!

Read these recipes

Then the manual!

Missing a gentle introduction

The examples (“recipes”) are nice, but there should be more of them, and they should be introduced step-by-step as exercises.

Monday, October 28, 13

Page 8: What I learned from Rascal

Warming up

Monday, October 28, 13

Page 9: What I learned from Rascal

9

Polymorphism detection

First test case for Rascal:

Can polymorphic call sites be more effectively detected statically (false positives) or dynamically (false negatives)?

1. Implement static heuristics based on M32. ...

3. ...

This first project was designed to see how well Rascal and M3 are suited to static and dynamic analysis. Since M3 does not go below the method level, only some simple heuristics were implemented.

Monday, October 28, 13

Page 10: What I learned from Rascal

10

What’s M3?

M3 is a language-independent meta-model for software analysis Like FAMIX ..

module analysis::m3::Core...

data M3 = m3(loc id); anno rel[loc name, loc src] M3@declarations; // maps declarations to where they are declared. contains any kind of data or type or code declaration (classes, fields, methods, variables, etc. etc.)anno rel[loc name, TypeSymbol typ] M3@types; // assigns types to declared source code artifactsanno rel[loc src, loc name] M3@uses; // maps source locations of usages to the respective declarationsanno rel[loc from, loc to] M3@containment; // what is logically contained in what else (not necessarily physically, but usually also)anno list[Message messages] M3@messages; // error messages and warnings produced while constructing a single m3 modelanno rel[str simpleName, loc qualifiedName] M3@names; // convenience mapping from logical names to end-user readable (GUI) names, and vice versaanno rel[loc definition, loc comments] M3@documentation; // comments and javadoc attached to declared thingsanno rel[loc definition, Modifier modifier] M3@modifiers; // modifiers associated with declared things

...

Each annotation is a list or a relation ...

Monday, October 28, 13

Page 11: What I learned from Rascal

11

M3 for Java

module lang::java::m3::Core...extend analysis::m3::Core;

...

anno rel[loc from, loc to] M3@extends; // classes extending classes and interfaces extending interfacesanno rel[loc from, loc to] M3@implements; // classes implementing interfacesanno rel[loc from, loc to] M3@methodInvocation; // methods calling each other (including constructors)anno rel[loc from, loc to] M3@fieldAccess; // code using data (like fields)anno rel[loc from, loc to] M3@typeDependency; // using a type literal in some code (types of variables, annotations)anno rel[loc from, loc to] M3@methodOverrides; // which method override which other methods

...

extends M3 with Java specifics

Monday, October 28, 13

Page 12: What I learned from Rascal

12

m3(|project://p2-SnakesAndLadders|)[ @fieldAccess={ <|java+method://p2-SnakesAndLadders/snakes/Game/toString()|,|java+field://p2-SnakesAndLadders/snakes/Game/squares|>, <|java+method://p2-SnakesAndLadders/snakes/Game/addSquares(int)|,|java+field://p2-SnakesAndLadders/snakes/Game/squares|>, ... }, @extends={ <|java+class://p2-SnakesAndLadders/snakes/Snake|,|java+class://p2-SnakesAndLadders/snakes/Ladder|>, <|java+class://p2-SnakesAndLadders/snakes/Ladder|,|java+class://p2-SnakesAndLadders/snakes/Square|>, ... }, @methodInvocation={ <|java+method://p2-SnakesAndLadders/snakes/SimpleGameTest/move8jillWins(snakes.Game)|,|java+method://p2-SnakesAndLadders/snakes/Game/winner()|>, <|java+variable://p2-SnakesAndLadders/snakes/Game/toString()/buffer|,|java+constructor://p2-SnakesAndLadders/java/lang/StringBuffer/StringBuffer()|>, ... }, @typeDependency={ <|java+method://p2-SnakesAndLadders/snakes/Square/nextSquare()|,|java+primitiveType://p2-SnakesAndLadders/int|>, <|java+method://p2-SnakesAndLadders/snakes/DieTest/testMinReached()|,|java+class://p2-SnakesAndLadders/snakes/DieTest|>, ...

Rascal talks to Eclipse to build an M3 modelM3 snakesM3 = createM3FromEclipseProject(|project://p2-SnakesAndLadders|);

URIs (“locations”)

Monday, October 28, 13

Page 13: What I learned from Rascal

13

Most M3 queries are 1-liners

@doc { Return the set of (all) subtypes of a type. }@memo public set[loc] subtypes(M3 m, loc aType) =invert(getDeclaredTypeHierarchy(m)+)[aType];

Memoized function

Transitive closureInvert relation

Monday, October 28, 13

Page 14: What I learned from Rascal

14

@doc { Returns the source URI for the method URI. }public loc getSource(M3 m, loc method) =! getUniqueElement(m@declarations[method]);

@doc { Returns unique element of a set, or fails. }private &T getUniqueElement(set[&T] s) {! assert size(s) == 1;! return getOneFrom(s);}

Many M3 relations are actually (partial functions)

Note the use of generics and assertions.

Monday, October 28, 13

Page 15: What I learned from Rascal

15

@doc { Return classes with subclasses and interfaces with >1 implementations. } public set[loc] polymorphTypes(M3 m) {! set[loc] types = getDeclaredTypeHierarchy(m)<0>;! return { t | t <- types, (isClass(t) && size(subtypes(m,t)) > 0)! ! ! ! ! ! ! || (isInterface(t) && size(subtypes(m,t)) > 1) };}

@doc { Returns the type symbol for a given class loc. }public TypeSymbol getTypeSymbol(M3 m, loc t) = getUniqueElement(m@types[t]);

@doc { Return fields declared to be of polymorphic types. }public set[loc] polymorphFields(M3 m) {! set[TypeSymbol] ts = { getTypeSymbol(m,t) | t <- polymorphTypes(m) };! return { t | t <- invert(m@types)[ts], isField(t) };}

Polymorphic candidates

Some heuristics are easy to express; others would require access to the AST ...

private loc squareField = |java+field://p2-SnakesAndLadders/snakes/Player/square|;test bool testPolymorphFields() = polymorphFields(snakes()) == { squareField };

Monday, October 28, 13

Page 16: What I learned from Rascal

M3 to MSE

https://github.com/onierstrasz/rascal-m3-to-mse.git

Monday, October 28, 13

Page 17: What I learned from Rascal

(! (FAMIX.Namespace (id: 182)! ! (name 'snakes'))! (FAMIX.Class (id: 8)! ! (name 'DieTest')...

m3(|project://...|)[ @fieldAccess={ ... },...

17

Idea: steal models from Rascal

Especially interesting for languages other than Java!

Monday, October 28, 13

Page 18: What I learned from Rascal

18

Need IDs for all FAMIX entities

@doc { Returns a map from FAMIX entity values to unique IDs. }@memo public map[value,int] idMap(M3 m) {! set[value] entities = m@declarations<0> + m@declarations<1> // classes, methods, ...! ! ! ! ! ! + primitiveTypes(m) // NB: TypeSymbols; all others are locations! ! ! ! ! ! + importedTypes(m)! ! ! ! ! ! + { unknownFieldType() }! ! ! ! ! ! + m@methodInvocation<1> // external methods! ! ! ! ! ! + m@fieldAccess<1> // external fields! ! ! ! ! ! + m@extends + m@implements // inheritances! ! ! ! ! ! + m@methodInvocation! ! ! ! ! ! + m@fieldAccess! ! ! ! ! ! + importedPackages(m);! return index(entities);}

NB: locations or TypeSymbols entities we might need...

handy library function!

It was an iterative process to figure out what entities were needed ...

Monday, October 28, 13

Page 19: What I learned from Rascal

19

Short cut: directly spit out MSE@doc { Write the MSE for an Eclipse Java project to its source directory. }public void writeMSE(M3 m) {! loc file = m.id + "<m.id.authority>.mse";! writeFile(file, "(\n");! appendPackages(file, m);! appendClasses(file, m);...! appendToFile(file, ")\n");}

private void appendClasses(loc file, M3 m) {! for (loc c <- classes(m) + interfaces(m) + anonClasses(m)) {! ! appendToFile(file,! ! ! "! (FAMIX.Class (id: <getID(m,c)>)! ! ! '! ! (name \'<getClassName(c)>\')! ! ! '! ! (container (ref: <getID(m, getClassPackage(m, c))>))! ! ! '! ! (isInterface <isInterface(c)?true:false>))! ! ! '");! ! // TODO: modifiers, sourceAnchor ...! }}

Easy, but complicates debuggingRascal string templates

Directly spitting out MSE means there is no way to check the consistency of the output

Monday, October 28, 13

Page 20: What I learned from Rascal

20

Debugging MSE is painful

Moose did not make it easy to track down errors in the generated MSE.A script to post-check for dangling references helped.

Monday, October 28, 13

Page 21: What I learned from Rascal

21

@doc { Return the package URI for a given class URI. }public loc getClassPackage(M3 m, loc c) {! set[loc] parents = parents(m)[c]?{};! if (isEmpty(parents)) {! ! return unknownPackage(c);! }! loc parent = getUniqueElement(parents);! return isPackage(parent) ? parent : getClassPackage(m, parent);}

Missing abstractions were easy to build

One of few functions that weren’t 1-liners.

Monday, October 28, 13

Page 22: What I learned from Rascal

22

private bool isPrimitive(\int()) = true;private bool isPrimitive(\float()) = true;private bool isPrimitive(\double()) = true;private bool isPrimitive(\short()) = true;private bool isPrimitive(\boolean()) = true;private bool isPrimitive(\char()) = true;private bool isPrimitive(\byte()) = true;private bool isPrimitive(\long()) = true;private bool isPrimitive(\void()) = true;private bool isPrimitive(\null()) = true;private bool isPrimitive(\array(_,_)) = true;private bool isPrimitive(\typeParameter(_, _)) = true; private default bool isPrimitive(TypeSymbol s) = false;

public set[TypeSymbol] primitiveTypes(M3 m) =! { t | t <- types(m), isPrimitive(t) }! + { t | t <- returnTypes(m), isPrimitive(t) }! + { t | t <- parameterTypes(m), isPrimitive(t) };

@doc { Return the ID of the type of a field. }public int declaredTypeID(M3 m, loc f) {! try! ! TypeSymbol ts = declaredTS(m, f);! catch :! ! return getID(m, unknownFieldType());! if (isPrimitive(ts)) {! ! return getID(m, ts);! }! return getID(m, location(ts));}

Monday, October 28, 13

Page 23: What I learned from Rascal

23

public set[loc] locations(class(loc decl, _)) = { decl };public set[loc] locations(interface(loc decl, _)) = { decl };public set[loc] locations(method(loc decl, _, _, _)) = { decl };public set[loc] locations(constructor(loc decl, _)) = { decl };public set[loc] locations(enum(loc decl)) = { decl };public set[loc] locations(typeParameter(decl,_)) = { decl };public set[loc] locations(object()) = { unknownFieldType() }; // TEMPORARY HACKpublic default set[loc] locations(TypeSymbol _) = {};test bool testLocations1() = locations(\int()) == {};test bool testLocations2() = locations(playerTS) == {playerClass};test bool testLocations3() = locations(setSquareTS) == {setSquare};

public loc location(TypeSymbol ts) {! assert(!isPrimitive(ts));! return getUniqueElement(locations(ts));}

@doc { Returns the locations of a set of TypeSymbols. }public set[loc] locationsOf(set[TypeSymbol] tsSet) {! return { location(ts) | ts <- tsSet, !isPrimitive(ts) };}

Monday, October 28, 13

Page 24: What I learned from Rascal

24

@doc { Returns imported classes and interfaces, i.e., used, but not declared. }public set[loc] importedTypes(M3 m) =! usedTypes(m)! + superTypes(m)! + locationsOf(returnTypes(m))! + locationsOf(parameterTypes(m))! - m@declarations<0>;public set[loc] superTypes(M3 m) = m@extends<1> + m@implements<1>;public set[loc] usedTypes(M3 m) =! ! { decl | \class(decl, _) <- types(m)}! ! + { decl | \interface(decl, _) <- types(m)};

@doc { Return the return type of a method. If primitive, returns the TypeSymbol. }public value returnType(M3 m, loc meth) {! try {! ! TypeSymbol ts = ! ! ! getUniqueElement({rt | \method(_, _, TypeSymbol rt, _) <- typeOf(m)[meth]});! ! return isPrimitive(ts) ? ts : location(ts);! }! catch :! ! return unknownFieldType();}

Monday, October 28, 13

Page 25: What I learned from Rascal

25

:set profiling true@doc { Return map of declarations; memoized for performance. }@memo private map[loc, set[loc]] sourceLocMap(M3 m) = toMap(m@declarations);

@doc { Memoize conversion to map for performance.}@memo public map[loc,set[loc]] parents(M3 m) = toMap(invert(m@containment));

@doc { Return type(s) of an entity; memoized map for performance. }@memo private map[loc,set[TypeSymbol]] typeOf(M3 m) = toMap(m@types);

@memo public set[loc] externalFields(M3 m) = m@fieldAccess<1> - fields(m);public bool isExternalField(M3 m, loc f) = f in externalFields(m);

@memo public set[loc] externalMethods(M3 m) = m@methodInvocation<1> - methods(m);public bool isExternalMethod(M3 m, loc meth) = meth in externalMethods(m);

rascal>:set profiling trueok

rascal>writeMSE(sm);PROFILE: 124 data points, 472 ticks, tick = 1 milliSecs Source File Ticks % Source rascal://Set 40 8.5% |rascal://Set|(4059,1,<148,18>,<148,19>) rascal://Set 36 7.6% |rascal://Set|(4051,5,<148,10>,<148,15>)rascal://lang::java::m3::Core 29 6.1% |rascal://lang::java::m3::Core|(2758,69,<74,28>,<74,97>) rascal://m3::M3toMSE 25 5.3% |rascal://m3::M3toMSE|(10252,1,<342,54>,<342,55>)rascal://lang::java::m3::Core 21 4.4% |rascal://lang::java::m3::Core|(2925,32,<76,30>,<76,62>) rascal://Relation 19 4.0% |rascal://Relation|(10157,15,<457,2>,<457,17>) rascal://Relation 18 3.8% |rascal://Relation|(10164,8,<457,9>,<457,17>)

Memoizing selected functions and converting certain relations to maps improved performance 60x! (20 sec vs 20 min for one run).

Monday, October 28, 13

Page 26: What I learned from Rascal

26

Rascal in Moose

Monday, October 28, 13

Page 27: What I learned from Rascal

Syntax, Islands,and Water

https://github.com/onierstrasz/rascal-islands.git

Monday, October 28, 13

Page 28: What I learned from Rascal

28

Easy peasy MSE parserstart syntax Famix = "(" Entity* ")" ;

syntax Entity = "(" EntityName EntityID Attribute* ")" ;

syntax Attribute = "(" AttributeName Value+ ")" ;

syntax Value = String | Boolean | Number | EntityRef ;

...

Nothing tricky here

Monday, October 28, 13

Page 29: What I learned from Rascal

29

Idea: island parser for structure

Idea proposed by Patrick Viry

Several false starts(grammar errors, ambiguity)

Start with a flat island parser

start syntax Code = Stuff+ ;

syntax Stuff = String | Char | Comment | Word | Noise | Paren // flat, no structure ;

The idea is to ignore everything except parentheses and curly braces to infer as much as possible about the structure of an unknown language.

Monday, October 28, 13

Page 30: What I learned from Rascal

30

start syntax Code = code: Stuff* ;

syntax Stuff = Water | Island ;

syntax Water = String | Char | Comment | Noise ;

syntax Island = Word | Struct ;

syntax Struct  = round: "(" Code ")"  | curly: "{" Code "}"  | square: "[" Code "]"  ;

Then “graduate” to a structured island parser

Since getting the syntactic elements right is hard, start with a flat parser and then introduce the structure.

Monday, October 28, 13

Page 31: What I learned from Rascal

31

Use toy and real code to debug

Monday, October 28, 13

Page 32: What I learned from Rascal

32

IList r = readLines(a, "`", "\"", "\"", "\\\\\"", "<", "\\\\<", ">", "\\\\>");

|project://rascal-clone/src/org/rascalmpl/library/util/SystemAPI.java|

Homing in on syntax errors@doc { Binary search to find smallest sublist of lines giving a parse error. }private list[str] binSearchErrs(type[&T<:Tree] begin, list[str] input) {! ...! assert(low+high == input);! try! ! parse(begin, intercalate("\n",low));! catch :! ! return binSearchErrs(begin, low);! try! ! parse(begin, intercalate("\n",high));! catch :! ! return binSearchErrs(begin, high);! return input; // failed to find a substring with the error}

Parse errors often did not give enough context. By automating a binary search, minimal examples could be found and turned into tests cases.(This worked for the flat parser.)

Monday, October 28, 13

Page 33: What I learned from Rascal

33

Ambiguities hard to fix ...rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Player.java|);bool: true

rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Game.java|);bool: true

rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Ladder.java|);bool: true

rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/SimpleGameTest.java|);bool: true

rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Die.java|);bool: true

rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/Square.java|);bool: true

rascal>/amb(_) := parse(#start[Code], |project://p2-SnakesAndLadders/src/snakes/DieTest.java|);bool: true

lexical Comment! = "/*" (![*] | [*] !>> [/])* "*/" ! | "//" ![\n]*! ;

lexical Word = word: [a-zA-Z_][a-zA-Z0-9_\-]* !>> [a-zA-Z0-9_\-] ;

syntax Noise = NoiseChar+ ;

lexical NoiseChar = ![a-zA-Z_(){}\[\]\"\'] | "/" !>> [*/] ;

Somewhere here some ambiguity lurks, but it is hard to track down with the current tools ...

Monday, October 28, 13

Page 34: What I learned from Rascal

34

Visualization identified test cases

The renderParsetree() library function helped to home in on the problematic cases.

Monday, October 28, 13

Page 35: What I learned from Rascal

35

Water is hard!

lexical Noise // numbers and operators = (![a-zA-Z_(){}\[\]\"\'/])+ !>> ![a-zA-Z_(){}\[\]\"\'/] | "/" !>> [*/] ;

This worked, but it took a lot of effort to come up with this rule.(One fatal error was that Noise was declared as syntax rather than lexical.)

Monday, October 28, 13

Page 36: What I learned from Rascal

36

Contextual analysis ... ?public void countWords(loc project) {! list[str] allWords =! ! ( [] | it + words(parse(#start[Code], src).top) | src <- toList(javaFiles(project)) );! for (<n,k> <- sort(countStrings(allWords)))! ! println(<n,k>);}

rascal>countWords(|project://p2-SnakesAndLadders|);<1,"(JExample)"><1,"(class)"><1,"Die"><1,"DieTest"><1,"FirstSquare">...<25,"{(int)}"><32,"{{this}}"><33,"{{(game)}}"><35,"{{game}}"><37,"{{(position)}}"><38,"{{assertEquals}}"><52,"{{return}}"><68,"{public}">ok

Monday, October 28, 13

Page 37: What I learned from Rascal

The Good, the Bad ...

Monday, October 28, 13

Page 38: What I learned from Rascal

38

Debugging grammars

Debugging failed tests

Misplaced syntax errors

Monday, October 28, 13

Page 39: What I learned from Rascal

39

Tutor needs work!

Oh no, no OO!

name '...'modifiers '…' '...'

Class

startLine NendLine NfileName '...'

FileAnchor

sourceAnchor

element

signature '…'Invocation

cyclomaticComplexity Nkind '…'numberOfStatements Nsignature '…'

Method

candidates

previous

sender Parameter

receiver

name '...'isStub BOOL

PrimitiveType

Type

parentType

sourceAnchor

element

declaredType

name '…'TypedEntity

parentBehaviouralEntity

Attribute

modifiers '...'ClassMember

Access

accessor

previous

variable

Inheritanceprevious

superclass

subclass vs.

m3(|project://...|)[ @fieldAccess={...}, @extends={...}, @methodInvocation={...}, @typeDependency={...}, @messages=[...], @containment={...}, @names={...}, @implements={...}, @documentation={...}, @uses={...}, @methodOverrides={...}, @types={...}, @modifiers={...}, @declarations={...}]

The lack of OO caused me some culture shock. I felt that functions that applied to certain data types should have been methods.I also missed an OO layer around the M3 models.

Monday, October 28, 13

Page 40: What I learned from Rascal

40

Tests

public set[TypeSymbol] parameterTypes(M3 m) = ( {} | it + e | e <- { toSet(pt) | \method(_, _, _, list[TypeSymbol] pt) <- types(m) });

Compact functional style

Locations

rascal>:set profiling trueok

rascal>writeMSE(sm);PROFILE: 124 data points, 472 ticks, tick = 1 milliSecs Source File Ticks % Source rascal://Set 40 8.5% |rascal://Set|(4059,1,<148,18>,<148,19>) rascal://Set 36 7.6% |rascal://Set|(4051,5,<148,10>,<148,15>)...

Profiling

Live and offline help!Thanks!

Libraries

Integrated testing was very handy.The compact functional style led to lots of 1-liners.Profiling made optimization very easy.Locations made code navigation easy.Feedback was quick offline, but getting live help was even better!

Monday, October 28, 13