scala json features and performance
TRANSCRIPT
SCALA JSON FEATURES AND PERFORMANCES
John Nestor- 47 Degrees [email protected]
Dragos Manolescu [email protected]
https://github.com/47deg/json-perf
47deg.com 1
47deg.com
DISCLAIMER
• Best effort attempt to measure performance and describe features.
• Corrections always appreciated.
• Also let us know any Json parsers we missed.
47deg.com 2
47deg.com
• There are lots of Scala Json parsers
• You can also use Java Json parsers in Scala
• How to Choose:
• Performance
• Features
• API
• Support (will not be abandoned)
• License (most are Apache 2)
SCALA JSON
3
47deg.com
SCALA (2.11) JSON PARSERS
Parser URL Version Language
Persist Json https://github.com/nestorpersist/json 1.1.0 Scala
Rojoma https://github.com/rjmac/rojoma-json 3.3.0 Scala
Jackson http://wiki.fasterxml.com/JacksonHome 2.5.3 Scala/Java
Spray Json https://github.com/spray/spray-json 1.3.2 Scala
Lift Json https://github.com/lift/lift/tree/master/framework/lift-base/lift-json/ 2.6.2 Scala
Twitter Json https://github.com/stevej/scala-json NA Scala
Scala Library https://github.com/scala/scala-parser-combinators 1.0.4 Scala
Play Json https://www.playframework.com/documentation/2.0/ScalaJson 2.4.1 Scala/Java
Json Smart https://github.com/netplex/json-smart-v2 2.1.0 Java
Argonaut http://argonaut.io/ 6.0.4 Scala
JAWN https://github.com/non/jawn 0.8.0 Scala
4
47deg.com
THE PARSERS (1 OF 4)
• Scala Library. This parser is part of the standard Scala library in package scala.util.parsing.json. It is implemented using parsing combinators.
• Twitter Json. A cleaned up version of the JSON parser in Odersky's Scala book. It is implemented using parsing combinators. Written by Steve Jenson while at Twitter.
• Persist Json. Developed as part of the OStore, a new NoSQL database written in Scala. OStore started with the Twitter parser. This turned out to be much too slow, so it was rewritten from scratch keeping mostly the same API but with an emphasis on speed. Developed by John Nestor (with the codex based mapper by JR Dejardin).
5
47deg.com
THE PARSERS (2 OF 4)
• Play Json. A part of the Typesafe Play framework. Implemented using Jerkson, a Scala wrapper on Jackson.
• Lift Json. Developed as part of Lift, a framework for building web apps.
• Spray Json. Developed as part of Spray, a REST/HTTP network IO toolkit.
6
47deg.com
THE PARSERS (3 OF 4)
• Argonaut. Purely functional Json in Scala. Uses Scalaz.
• Rojama. Another Scala parser that makes extensive use of Scala’s functional features. Developed by Robert Macomber of Socrata.
• Jawn. Jawn was designed to parse JSON into an AST as quickly as possible.
7
47deg.com
THE PARSERS (4 OF 4)
• Jackson. Generally regarded as the best and fastest Java Json parser. Has a very rich set of features. We test using the DefaultScalaModule (by Chris Currie) that provides Scala support.
• Json Smart. A newer faster (than Jackson) Json parser written in Java.
8
47deg.com
TEST SETS FOR PERFORMANCE TESTING
• Twitter. Tweets processed by the Yap.tv Guide (http://j.mp/15WL0p3), a service providing a personalized TV guide companion experience based on social content from Twitter and Facebook.This data set contains 100 tweets in Json (http://j.mp/13lKbU6).
• Google. PlaceSearchResults returned by Google in response to place queries at 100 locations. The locations correspond to the top best places to live in 2012, as compiled by CNN Money (http://j.mp/13NmVid). This data set contains 138 PlaceSearchResults in Json (http://j.mp/13NmCUC) using keyword “brewery” and a radius of 2 miles.
• Each file has one Json object per line.
9
47deg.com
PRETTY SAMPLE TWITTER JSON{"contributors":null, "coordinates":null, "created_at":"Mon Jun 27 21:45:46 +0000 2011", "entities": {"hashtags":[], "urls": [{"display_url":"mercynotes.com", "expanded_url":"http://www.mercynotes.com/", "indices":[61,80], "url":"http://t.co/lKzLFOd" } ], "user_mentions":[] }, "favorited":false, "geo":null, "id":85463859615379456, "id_str":"85463859615379456", "in_reply_to_screen_name":null, "in_reply_to_status_id":null, "in_reply_to_status_id_str":null, "in_reply_to_user_id":null, "in_reply_to_user_id_str":null, "place":null, "retweet_count":0, "retweeted":false, "source":"web", "text": "Been watching Wimbledon? Check out new post Love and Tennis: http://t.co/lKzLFOd", "truncated":false, "user": {"contributors_enabled":false, "created_at":"Mon May 30 16:35:44 +0000 2011", "default_profile":true, "default_profile_image":false, "description":"", "favourites_count":0, "follow_request_sent":null, "followers_count":6, "following":null, "friends_count":12, "geo_enabled":false, "id":307978890, "id_str":"307978890", "is_translator":false, "lang":"en", "listed_count":0, "location":"NC", "name":"Julie LaJoe", "notifications":null, "profile_background_color":"C0DEED", "profile_background_image_url": "http://a0.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://si0.twimg.com/images/themes/theme1/bg.png", "profile_background_tile":false, "profile_image_url": "http://a0.twimg.com/profile_images/1375001769/JulieMNnew__2__normal.jpg", "profile_image_url_https": "https://si0.twimg.com/profile_images/1375001769/JulieMNnew__2__normal.jpg", "profile_link_color":"0084B4", "profile_sidebar_border_color":"C0DEED", "profile_sidebar_fill_color":"DDEEF6", "profile_text_color":"333333", "profile_use_background_image":true, "protected":false, "screen_name":"mercynotes", "show_all_inline_media":false, "statuses_count":13, "time_zone":"Quito", "url":"http://mercynotes.com", "utc_offset":-18000, "verified":false } }
10
47deg.com
PRETTY SAMPLE GOOGLE JSON{"address_components": [{"long_name":"622", "short_name":"622", "types":["street_number"] }, {"long_name":"South Rangeline Road", "short_name":"South Rangeline Road", "types":["route"] }, {"long_name":"Carmel", "short_name":"Carmel", "types":["locality","political"] }, {"long_name":"Hamilton", "short_name":"Hamilton", "types": ["administrative_area_level_2","political"] }, {"long_name":"Indiana", "short_name":"Indiana", "types": ["administrative_area_level_1","political"] }, {"long_name":"US", "short_name":"US", "types":["country","political"] }, {"long_name":"46032", "short_name":"46032", "types":["postal_code"] } ], "formatted_address": "Suite Q, 622 South Rangeline Road, Carmel, Indiana, United States", "formatted_phone_number":"(317) 429-6345", "geometry": {"location":{"lat":39.971703,"lng":-86.129099}}, "icon": "http://maps.gstatic.com/mapfiles/place_api/icons/generic_business-71.png", "id":"fcd83d32717980ec1fec2c7ec8719389b201a331", "international_phone_number":"+1 317-429-6345", "name":"Union Brewing Company", "opening_hours": {"open_now":true, "periods": [{"close":{"day":0,"time":"2000"}, "open":{"day":0,"time":"1200"} }, {"close":{"day":2,"time":"2200"}, "open":{"day":2,"time":"1600"} }, {"close":{"day":4,"time":"2200"}, "open":{"day":4,"time":"1600"} }, {"close":{"day":6,"time":"0000"}, "open":{"day":5,"time":"1500"} }, {"close":{"day":0,"time":"0000"}, "open":{"day":6,"time":"1200"} } ] }, "photos": [{"height":1632, "html_attributions": ["<a href="https://plus.google.com/117934275405882297051">Greg Magnusson</a>" ], "photo_reference": "CnRoAAAAvN9y_gkgZIGa13kUSyyBlqwholvjtH4NKo-BzvlklcX-Tt9Ysc6HRMXPxKl3PumZtiOnomHi-Nk83y-lxf8RX8nsWulwuCBpY2okAqaU9wohOhncStFPZlKr02t3WquA6pt8mfCYYO-NAdU2HwdM1hIQYJmus4wpQBaRtP7BFdYhzRoU4XvzfAAQQwkdJZluFJ-tDoUulIo", "width":1224 } ], "reference": "CoQBcgAAAF3VKrWBUmLMv5tLs1Ru47j3Tbxa6lPxlIFj5BUvpsTyPt3bpui2vOTCcaHjKYuAjSulIPHpd0YFgm5CKLQH6P_19xU1UPeu6avWeIMWA0u4hxyx4TazCfFF9ESCwHaOEcKZfRyJSD2b5p2IJvT0eVkFFExeWbqAcWrH80jIQ-VrEhAvUSpbmH3rB4LEKn-cZtsYGhQxFpeco4U1rUtwe-ncAttqLBnSgQ", "reviews": [{"aspects":[{"rating":3,"type":"quality"}], "author_name":"Greg Magnusson", "author_url": "https://plus.google.com/117934275405882297051", "text": "Truly outstanding local craft brewing company. Indy's got some great local brewers, but these guys really get it right. Nice little location in Carmel, great beer and local guest taps... I'm so glad these guys moved into town. Love!", "time":1361059887 } ], "types":["food","establishment"], "url": "https://plus.google.com/102928473191458623183/about?hl=en-US", "utc_offset":-300, "vicinity": "Suite Q, 622 South Rangeline Road, Carmel", "website":"http://www.unionbrewingco.com" }
11
47deg.com
• Timing is done with Java System.nanotime().
• For each data set, each line is processed.
• This is repeated 25 time to warm JVM.
• This is repeated 200 times for measurement.
• For example, google has 138 Json lines, so during warmup a total of 3450 lines are parsed and during testing 27600 lines are parsed.
• The total summed nanoseconds for all 27600 parse steps are reported as milliseconds for each parser.
TESTING PROCESS
12
47deg.com
TIMING SCALA/JAVA CODE
• Timing is tricky! For example see
• http://www.ibm.com/developerworks/library/j-benchmark1/
• A few of the many issues:
• Warmup (run several times to warm JVM)
• Repeatability (use average?, but what about P99?)
• Interference from other processes
• Caches
• Garbage collection
• Chosen data set
13
47deg.com
TESTING MACHINE
• Times obviously depend on speed of machine used in testing.
• Numbers here are for a MacBook pro with
• 2 2.9 GHz cores
• 16GB of main memory
• You can run tests on a machine of your choice!
14
47deg.com
PARSING TIMES (MS)
Parser Twitter Google Ignore
Persist Json 443 712
Rojoma 540 1251
Jackson 445 842
Spray Json 603 1115
Lift Json 469 1002
Twitter Json 18179 42316 Too Slow
Scala Library 126006 329215 Way Too Slow
Play Json 442 1027
Json Smart 251 424
Argonaut 784 1448
JAWN 603 748
15
47deg.com
WHY IS TWITTER SLOW?
• Parsing combinators. Elegant but slow.
• Interpreted. Backtracking.
18
def value: Parser[Any] = obj | arr | string | number | "null" ^^ (x => null) | "true" ^^ (x => true) | "false" ^^ (x => false) def obj: Parser[Map[String, Any]] = "{" ~> repsep(member, ",") <~ "}" ^^ (Map.empty ++ _)def arr: Parser[List[Any]] = "[" ~> repsep(value, ",") <~ "]"def member: Parser[(String, Any)] = string ~ ":" ~ value ^^ { case name ~ ":" ~ value => (name, value)}
47deg.com
WHY IS THE SCALA LIBRARY EVEN SLOWER?
• Like Twitter uses parsing combinators.
• But why is it so much slower?
19
47deg.com
WHY IS PLAY SO SLOW IF IT USES JACKSON?
• It uses Jerkson (which is abandoned)?
• ???
20
47deg.com
JSON LANGUAGE EXTENSIONS
Parser Comments NoQuotes Root Type Other
Persist Json // field any raw strings
Rojoma //,/**/ field any keeps field order
Jackson // field object can use ‘
Spray Json // any
Lift Json object keeps field order
Twitter Json any
Scala Library any
Play Json object
Json Smart # field/value object
Argonaut object keeps field order
JAWN object
21
47deg.com
PARSER RESULTS (ASTS)
Parser Object, Array Wrapped in Object Immutable Collections
Persist Json Map, List no yes Scala
Rojoma LinkedHashMap, Vector yes no Scala
Jackson Map, List yes no Java
Spray Json Map, Vector yes yes Scala
Lift Json List[Field], List yes yes Scala
Twitter Json Map, List no yes Scala
Scala Library Map, List yes yes Scala
Play Json Map, List yes yes Scala
Json Smart HashMap, List yes no Java
Argonaut scalaz.InsertionMap, List yes yes Scala
JAWN Map, Array yes no Scala
22
47deg.com
UNPARSING
• The inverse of parsing (deserialization) is unparsing (serialization).
• Unparsing takes the AST from parsing and converts it back to a string.
• Useful for debugging and logging.
• Many parsers also include a pretty printed unparser.
• Timing here for the “non-pretty” simple form.
23
47deg.com
UN-PARSING TIMES (MS)
Parser Twitter Google
Persist Json 622 1172
Rojoma 226 511
Jackson 11 29
Spray Json 232 676
Lift Json 1125 3211
Play Json 322 323
Json Smart 349 934
Argonaut 1005 2468
JAWN 498 1161
24
47deg.com
WHY IS JACKSON SO INCREDIBLY FAST?
• Uses SegmentedStringBuilder (rather than StringBuilder).
• Uses segmented internal buffer.
• Buffers are recycled.
27
47deg.com
WHY IS PERSIST SLOW?
• Uses raw Seq and Map rather than being wrapped in custom classes.
• Must use pattern match rather than virtual dispatch to a virtual method.
28
47deg.com
MAPPERS
• Parsers go from string to AST
• Mappers go to user specified case classes
• Twitter, Scala Library, Json Smart, JAWN: no mapper
• Jackson, Argonaut, Rojoma: string => case classes
• Others: string => AST => case classes
29
47deg.com
DYNAMIC VERSUS STATIC TYPING
• Dynamic: AST. More flexible and agile. No additional code needed for parsing. Can be used on any valid json data. But need extra code if more checking is needed.
• Static: User Specified Case Classes. Must specify case classes before parsing can proceed. More checking. Can attach behavior to case classes.
30
47deg.com
MAPPING TIMES (MS)
Parser Twitter Google
Persist Json 622 2238
Rojoma 1117 2669
Jackson 326 1150
Spray Json 557 1675
Lift Json 520 2060
Play Json 1123 3768
Argonaut 937 2550
31
47deg.com
MAPPERS
Parser Extra Code Lines Why
Persist Json 0
Rojoma 135 case classes, Array=>Seq
Jackson 0
Spray Json 16 case classes
Lift Json 7 BigDecimal
Play Json 16 case classes
Argonaut 180 case classes, Array=>List,Seq=>List, BigDecimal=>Double
34
47deg.com
AVOIDING EXTRA CODE
• Find types of case class parameter names. Java reflection works.
• Find names of case class parameters. Prior to Java 8 not available via Java reflection. Scala reflection however does work.
• Reflection can be quite slow. Caching can help!
• Persist: Shapeless
• Lift and Jackson: Paranamer. Gets info from reading Java byte code symbol tables.
35
47deg.com
SUMMARY
• Avoid: Scala Library, Twitter
• Fast parse and no other features: Json Smart
• Good overall choices: Jackson, Persist, Spray
• Very fast unparse: Jackson
36
47deg.com
THANKS!
38
QUESTIONS
To contact me or 47 Degrees:
Twitter @47deg
Web47deg.com