scala json features and performance

38
SCALA JSON FEATURES AND PERFORMANCES John Nestor- 47 Degrees [email protected] Dragos Manolescu dam@micro-workflow.com https://github.com/47deg/json-perf 47deg.com 1

Upload: john-nestor

Post on 25-Jan-2017

3.265 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Scala Json Features and Performance

SCALA JSON FEATURES AND PERFORMANCES

John Nestor- 47 Degrees [email protected]

Dragos Manolescu [email protected]

https://github.com/47deg/json-perf

47deg.com 1

Page 2: Scala Json Features and Performance

47deg.com

DISCLAIMER

• Best effort attempt to measure performance and describe features.

• Corrections always appreciated.

• Also let us know any Json parsers we missed.

47deg.com 2

Page 3: Scala Json Features and Performance

47deg.com

• There are lots of Scala Json parsers

• You can also use Java Json parsers in Scala

• How to Choose:

• Performance

• Features

• API

• Support (will not be abandoned)

• License (most are Apache 2)

SCALA JSON

3

Page 4: Scala Json Features and Performance

47deg.com

SCALA (2.11) JSON PARSERS

Parser URL Version Language

Persist Json https://github.com/nestorpersist/json 1.1.0 Scala

Rojoma https://github.com/rjmac/rojoma-json 3.3.0 Scala

Jackson http://wiki.fasterxml.com/JacksonHome 2.5.3 Scala/Java

Spray Json https://github.com/spray/spray-json 1.3.2 Scala

Lift Json https://github.com/lift/lift/tree/master/framework/lift-base/lift-json/ 2.6.2 Scala

Twitter Json https://github.com/stevej/scala-json NA Scala

Scala Library https://github.com/scala/scala-parser-combinators 1.0.4 Scala

Play Json https://www.playframework.com/documentation/2.0/ScalaJson 2.4.1 Scala/Java

Json Smart https://github.com/netplex/json-smart-v2 2.1.0 Java

Argonaut http://argonaut.io/ 6.0.4 Scala

JAWN https://github.com/non/jawn 0.8.0 Scala

4

Page 5: Scala Json Features and Performance

47deg.com

THE PARSERS (1 OF 4)

• Scala Library. This parser is part of the standard Scala library in package scala.util.parsing.json. It is implemented using parsing combinators.

• Twitter Json. A cleaned up version of the JSON parser in Odersky's Scala book. It is implemented using parsing combinators. Written by Steve Jenson while at Twitter.

• Persist Json. Developed as part of the OStore, a new NoSQL database written in Scala. OStore started with the Twitter parser. This turned out to be much too slow, so it was rewritten from scratch keeping mostly the same API but with an emphasis on speed. Developed by John Nestor (with the codex based mapper by JR Dejardin).

5

Page 6: Scala Json Features and Performance

47deg.com

THE PARSERS (2 OF 4)

• Play Json. A part of the Typesafe Play framework. Implemented using Jerkson, a Scala wrapper on Jackson.

• Lift Json. Developed as part of Lift, a framework for building web apps.

• Spray Json. Developed as part of Spray, a REST/HTTP network IO toolkit.

6

Page 7: Scala Json Features and Performance

47deg.com

THE PARSERS (3 OF 4)

• Argonaut. Purely functional Json in Scala. Uses Scalaz.

• Rojama. Another Scala parser that makes extensive use of Scala’s functional features. Developed by Robert Macomber of Socrata.

• Jawn. Jawn was designed to parse JSON into an AST as quickly as possible.

7

Page 8: Scala Json Features and Performance

47deg.com

THE PARSERS (4 OF 4)

• Jackson. Generally regarded as the best and fastest Java Json parser. Has a very rich set of features. We test using the DefaultScalaModule (by Chris Currie) that provides Scala support.

• Json Smart. A newer faster (than Jackson) Json parser written in Java.

8

Page 9: Scala Json Features and Performance

47deg.com

TEST SETS FOR PERFORMANCE TESTING

• Twitter. Tweets processed by the Yap.tv Guide (http://j.mp/15WL0p3), a service providing a personalized TV guide companion experience based on social content from Twitter and Facebook.This data set contains 100 tweets in Json (http://j.mp/13lKbU6).

• Google. PlaceSearchResults returned by Google in response to place queries at 100 locations. The locations correspond to the top best places to live in 2012, as compiled by CNN Money (http://j.mp/13NmVid). This data set contains 138 PlaceSearchResults in Json (http://j.mp/13NmCUC) using keyword “brewery” and a radius of 2 miles.

• Each file has one Json object per line.

9

Page 10: Scala Json Features and Performance

47deg.com

PRETTY SAMPLE TWITTER JSON{"contributors":null, "coordinates":null, "created_at":"Mon Jun 27 21:45:46 +0000 2011", "entities": {"hashtags":[], "urls": [{"display_url":"mercynotes.com", "expanded_url":"http://www.mercynotes.com/", "indices":[61,80], "url":"http://t.co/lKzLFOd" } ], "user_mentions":[] }, "favorited":false, "geo":null, "id":85463859615379456, "id_str":"85463859615379456", "in_reply_to_screen_name":null, "in_reply_to_status_id":null, "in_reply_to_status_id_str":null, "in_reply_to_user_id":null, "in_reply_to_user_id_str":null, "place":null, "retweet_count":0, "retweeted":false, "source":"web", "text": "Been watching Wimbledon? Check out new post Love and Tennis: http://t.co/lKzLFOd", "truncated":false, "user": {"contributors_enabled":false, "created_at":"Mon May 30 16:35:44 +0000 2011", "default_profile":true, "default_profile_image":false, "description":"", "favourites_count":0, "follow_request_sent":null, "followers_count":6, "following":null, "friends_count":12, "geo_enabled":false, "id":307978890, "id_str":"307978890", "is_translator":false, "lang":"en", "listed_count":0, "location":"NC", "name":"Julie LaJoe", "notifications":null, "profile_background_color":"C0DEED", "profile_background_image_url": "http://a0.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://si0.twimg.com/images/themes/theme1/bg.png", "profile_background_tile":false, "profile_image_url": "http://a0.twimg.com/profile_images/1375001769/JulieMNnew__2__normal.jpg", "profile_image_url_https": "https://si0.twimg.com/profile_images/1375001769/JulieMNnew__2__normal.jpg", "profile_link_color":"0084B4", "profile_sidebar_border_color":"C0DEED", "profile_sidebar_fill_color":"DDEEF6", "profile_text_color":"333333", "profile_use_background_image":true, "protected":false, "screen_name":"mercynotes", "show_all_inline_media":false, "statuses_count":13, "time_zone":"Quito", "url":"http://mercynotes.com", "utc_offset":-18000, "verified":false } }

10

Page 11: Scala Json Features and Performance

47deg.com

PRETTY SAMPLE GOOGLE JSON{"address_components": [{"long_name":"622", "short_name":"622", "types":["street_number"] }, {"long_name":"South Rangeline Road", "short_name":"South Rangeline Road", "types":["route"] }, {"long_name":"Carmel", "short_name":"Carmel", "types":["locality","political"] }, {"long_name":"Hamilton", "short_name":"Hamilton", "types": ["administrative_area_level_2","political"] }, {"long_name":"Indiana", "short_name":"Indiana", "types": ["administrative_area_level_1","political"] }, {"long_name":"US", "short_name":"US", "types":["country","political"] }, {"long_name":"46032", "short_name":"46032", "types":["postal_code"] } ], "formatted_address": "Suite Q, 622 South Rangeline Road, Carmel, Indiana, United States", "formatted_phone_number":"(317) 429-6345", "geometry": {"location":{"lat":39.971703,"lng":-86.129099}}, "icon": "http://maps.gstatic.com/mapfiles/place_api/icons/generic_business-71.png", "id":"fcd83d32717980ec1fec2c7ec8719389b201a331", "international_phone_number":"+1 317-429-6345", "name":"Union Brewing Company", "opening_hours": {"open_now":true, "periods": [{"close":{"day":0,"time":"2000"}, "open":{"day":0,"time":"1200"} }, {"close":{"day":2,"time":"2200"}, "open":{"day":2,"time":"1600"} }, {"close":{"day":4,"time":"2200"}, "open":{"day":4,"time":"1600"} }, {"close":{"day":6,"time":"0000"}, "open":{"day":5,"time":"1500"} }, {"close":{"day":0,"time":"0000"}, "open":{"day":6,"time":"1200"} } ] }, "photos": [{"height":1632, "html_attributions": ["<a href="https://plus.google.com/117934275405882297051">Greg Magnusson</a>" ], "photo_reference": "CnRoAAAAvN9y_gkgZIGa13kUSyyBlqwholvjtH4NKo-BzvlklcX-Tt9Ysc6HRMXPxKl3PumZtiOnomHi-Nk83y-lxf8RX8nsWulwuCBpY2okAqaU9wohOhncStFPZlKr02t3WquA6pt8mfCYYO-NAdU2HwdM1hIQYJmus4wpQBaRtP7BFdYhzRoU4XvzfAAQQwkdJZluFJ-tDoUulIo", "width":1224 } ], "reference": "CoQBcgAAAF3VKrWBUmLMv5tLs1Ru47j3Tbxa6lPxlIFj5BUvpsTyPt3bpui2vOTCcaHjKYuAjSulIPHpd0YFgm5CKLQH6P_19xU1UPeu6avWeIMWA0u4hxyx4TazCfFF9ESCwHaOEcKZfRyJSD2b5p2IJvT0eVkFFExeWbqAcWrH80jIQ-VrEhAvUSpbmH3rB4LEKn-cZtsYGhQxFpeco4U1rUtwe-ncAttqLBnSgQ", "reviews": [{"aspects":[{"rating":3,"type":"quality"}], "author_name":"Greg Magnusson", "author_url": "https://plus.google.com/117934275405882297051", "text": "Truly outstanding local craft brewing company. Indy&#39;s got some great local brewers, but these guys really get it right. Nice little location in Carmel, great beer and local guest taps... I&#39;m so glad these guys moved into town. Love!", "time":1361059887 } ], "types":["food","establishment"], "url": "https://plus.google.com/102928473191458623183/about?hl=en-US", "utc_offset":-300, "vicinity": "Suite Q, 622 South Rangeline Road, Carmel", "website":"http://www.unionbrewingco.com" }

11

Page 12: Scala Json Features and Performance

47deg.com

• Timing is done with Java System.nanotime().

• For each data set, each line is processed.

• This is repeated 25 time to warm JVM.

• This is repeated 200 times for measurement.

• For example, google has 138 Json lines, so during warmup a total of 3450 lines are parsed and during testing 27600 lines are parsed.

• The total summed nanoseconds for all 27600 parse steps are reported as milliseconds for each parser.

TESTING PROCESS

12

Page 13: Scala Json Features and Performance

47deg.com

TIMING SCALA/JAVA CODE

• Timing is tricky! For example see

• http://www.ibm.com/developerworks/library/j-benchmark1/

• A few of the many issues:

• Warmup (run several times to warm JVM)

• Repeatability (use average?, but what about P99?)

• Interference from other processes

• Caches

• Garbage collection

• Chosen data set

13

Page 14: Scala Json Features and Performance

47deg.com

TESTING MACHINE

• Times obviously depend on speed of machine used in testing.

• Numbers here are for a MacBook pro with

• 2 2.9 GHz cores

• 16GB of main memory

• You can run tests on a machine of your choice!

14

Page 15: Scala Json Features and Performance

47deg.com

PARSING TIMES (MS)

Parser Twitter Google Ignore

Persist Json 443 712

Rojoma 540 1251

Jackson 445 842

Spray Json 603 1115

Lift Json 469 1002

Twitter Json 18179 42316 Too Slow

Scala Library 126006 329215 Way Too Slow

Play Json 442 1027

Json Smart 251 424

Argonaut 784 1448

JAWN 603 748

15

Page 16: Scala Json Features and Performance

47deg.com

PARSING TIMES - TWITTER

16

Page 17: Scala Json Features and Performance

47deg.com

PARSING TIMES - GOOGLE

17

Page 18: Scala Json Features and Performance

47deg.com

WHY IS TWITTER SLOW?

• Parsing combinators. Elegant but slow.

• Interpreted. Backtracking.

18

def value: Parser[Any] = obj | arr | string | number | "null" ^^ (x => null) | "true" ^^ (x => true) | "false" ^^ (x => false) def obj: Parser[Map[String, Any]] = "{" ~> repsep(member, ",") <~ "}" ^^ (Map.empty ++ _)def arr: Parser[List[Any]] = "[" ~> repsep(value, ",") <~ "]"def member: Parser[(String, Any)] = string ~ ":" ~ value ^^ { case name ~ ":" ~ value => (name, value)}

Page 19: Scala Json Features and Performance

47deg.com

WHY IS THE SCALA LIBRARY EVEN SLOWER?

• Like Twitter uses parsing combinators.

• But why is it so much slower?

19

Page 20: Scala Json Features and Performance

47deg.com

WHY IS PLAY SO SLOW IF IT USES JACKSON?

• It uses Jerkson (which is abandoned)?

• ???

20

Page 21: Scala Json Features and Performance

47deg.com

JSON LANGUAGE EXTENSIONS

Parser Comments NoQuotes Root Type Other

Persist Json // field any raw strings

Rojoma //,/**/ field any keeps field order

Jackson // field object can use ‘

Spray Json // any

Lift Json object keeps field order

Twitter Json any

Scala Library any

Play Json object

Json Smart # field/value object

Argonaut object keeps field order

JAWN object

21

Page 22: Scala Json Features and Performance

47deg.com

PARSER RESULTS (ASTS)

Parser Object, Array Wrapped in Object Immutable Collections

Persist Json Map, List no yes Scala

Rojoma LinkedHashMap, Vector yes no Scala

Jackson Map, List yes no Java

Spray Json Map, Vector yes yes Scala

Lift Json List[Field], List yes yes Scala

Twitter Json Map, List no yes Scala

Scala Library Map, List yes yes Scala

Play Json Map, List yes yes Scala

Json Smart HashMap, List yes no Java

Argonaut scalaz.InsertionMap, List yes yes Scala

JAWN Map, Array yes no Scala

22

Page 23: Scala Json Features and Performance

47deg.com

UNPARSING

• The inverse of parsing (deserialization) is unparsing (serialization).

• Unparsing takes the AST from parsing and converts it back to a string.

• Useful for debugging and logging.

• Many parsers also include a pretty printed unparser.

• Timing here for the “non-pretty” simple form.

23

Page 24: Scala Json Features and Performance

47deg.com

UN-PARSING TIMES (MS)

Parser Twitter Google

Persist Json 622 1172

Rojoma 226 511

Jackson 11 29

Spray Json 232 676

Lift Json 1125 3211

Play Json 322 323

Json Smart 349 934

Argonaut 1005 2468

JAWN 498 1161

24

Page 25: Scala Json Features and Performance

47deg.com

UN-PARSING TIMES - TWITTER

25

Page 26: Scala Json Features and Performance

47deg.com

UN-PARSING TIMES - GOOGLE

26

Page 27: Scala Json Features and Performance

47deg.com

WHY IS JACKSON SO INCREDIBLY FAST?

• Uses SegmentedStringBuilder (rather than StringBuilder).

• Uses segmented internal buffer.

• Buffers are recycled.

27

Page 28: Scala Json Features and Performance

47deg.com

WHY IS PERSIST SLOW?

• Uses raw Seq and Map rather than being wrapped in custom classes.

• Must use pattern match rather than virtual dispatch to a virtual method.

28

Page 29: Scala Json Features and Performance

47deg.com

MAPPERS

• Parsers go from string to AST

• Mappers go to user specified case classes

• Twitter, Scala Library, Json Smart, JAWN: no mapper

• Jackson, Argonaut, Rojoma: string => case classes

• Others: string => AST => case classes

29

Page 30: Scala Json Features and Performance

47deg.com

DYNAMIC VERSUS STATIC TYPING

• Dynamic: AST. More flexible and agile. No additional code needed for parsing. Can be used on any valid json data. But need extra code if more checking is needed.

• Static: User Specified Case Classes. Must specify case classes before parsing can proceed. More checking. Can attach behavior to case classes.

30

Page 31: Scala Json Features and Performance

47deg.com

MAPPING TIMES (MS)

Parser Twitter Google

Persist Json 622 2238

Rojoma 1117 2669

Jackson 326 1150

Spray Json 557 1675

Lift Json 520 2060

Play Json 1123 3768

Argonaut 937 2550

31

Page 32: Scala Json Features and Performance

47deg.com

MAPPING TIMES - TWITTER

32

Page 33: Scala Json Features and Performance

47deg.com

MAPPING TIMES - GOOGLE

33

Page 34: Scala Json Features and Performance

47deg.com

MAPPERS

Parser Extra Code Lines Why

Persist Json 0

Rojoma 135 case classes, Array=>Seq

Jackson 0

Spray Json 16 case classes

Lift Json 7 BigDecimal

Play Json 16 case classes

Argonaut 180 case classes, Array=>List,Seq=>List, BigDecimal=>Double

34

Page 35: Scala Json Features and Performance

47deg.com

AVOIDING EXTRA CODE

• Find types of case class parameter names. Java reflection works.

• Find names of case class parameters. Prior to Java 8 not available via Java reflection. Scala reflection however does work.

• Reflection can be quite slow. Caching can help!

• Persist: Shapeless

• Lift and Jackson: Paranamer. Gets info from reading Java byte code symbol tables.

35

Page 36: Scala Json Features and Performance

47deg.com

SUMMARY

• Avoid: Scala Library, Twitter

• Fast parse and no other features: Json Smart

• Good overall choices: Jackson, Persist, Spray

• Very fast unparse: Jackson

36

Page 37: Scala Json Features and Performance

47deg.com

QUESTIONS

37

QUESTIONS

Page 38: Scala Json Features and Performance

47deg.com

THANKS!

38

QUESTIONS

To contact me or 47 Degrees:

[email protected]

[email protected]

Twitter @47deg

Web47deg.com