Optimizing Ruby's JSON, Part 4

(byroot.github.io)

211 points | by jeremy_k1 周前

15 comments

  • LukeShu4 天前
    Very glad to see the work that byroot is doing as the new ruby-json maintainer!

    Since I was mentioned by name in part 3, perhaps I can provide some interesting commentary:

    > All this code had recently been rewritten pretty much from scratch by Luke Shumaker ... While this code is very clean and generic, with a good separation of the multiple levels of abstractions, such as bytes and codepoints, that would make it very easy to extend the escaping logic, it isn’t taking advantage of many assumptions convert_UTF8_to_JSON could make to take shortcuts.

    My rewritten version was already slightly faster than the original version, so I didn't feel the need to spend more time optimizing it, at least until the simple version got merged; which I had no idea when that'd be because of silence from the then-maintainer. Every optimization would be an opportunity for more pain when rebasing away merge-conflicts; which was already painful enough the 2 times I had to do it while waiting for a reply.

    > One of these for instance is that there’s no point validating the UTF-8 encoding because Ruby did it for us and it’s impossible to end up inside convert_UTF8_to_JSON with invalid UTF-8.

    I don't care to dig through the history to see exactly what changed when, but: At the time I wrote it, the unit tests told me that wasn't true; if I omitted the checks for invalid UTF-8, then the tests failed.

    > Another is that there are only two multi-byte characters we care about, and both start with the same 0xE2 byte, so the decoding into codepoints is a bit superfluous. ... we can re-use Mame’s lookup table, but with a twist.

    I noted in the original PR description that I thought a lookup table would be faster than my decoder. I didn't use a lookup table myself (1) to keep the initial version simple to make code-review simple to increase likelihood that it got merged, and (2) the old proprietary CVTUTF code used a lookup table, and because I was so familiar with the CVTUTF code, I didn't feel comfortable being the one to to re-add a lookup table. Glad to see that my suspicion was correct and that someone else did the work!

    • JohnBooty4 天前
      Thanks so much for your work, and also thanks for some insight into choices you made.

      I'm not familiar with the internals of the JSON gem, but in general... yeah, it's funny right? PRs are almost never ideal. Always some compromise based on time available, code review considerations, etc.

      Everything you said makes a lot of sense!

    • byroot3 天前
      > At the time I wrote it, the unit tests told me that wasn't true

      Yes, it's something I changed before merging your patch.

      I didn't mean to say your patch wasn't good or anything It was very much appreciated.

    • Thanks for your work. I understand why you tried to keep it simple. Getting ignored or rejected by a maintainer is one of the least fun things I've ever experienced. Takes real skill to get something merged in, and not just technical skill.
  • peterohler3 天前
    Oj author here. While it's flattering to have Oj be the standard to beat I'd like to point out that most of the issues with Oj revolve around the JSON gem and Rails doing a monkey patch dance and Oj trying to keep pace with the changes. The Oj.mimic_JSON attempts to replace the JSON gem and only replaces the monkey patches made by that gem. The preferred approach for Oj outside of trying to mimic the JSON gem to to never monkey patch. That approach is used in all other modes that are not mimicking the JSON gem or Rails. I should point out that other Oj modes perform much better than the JSON gem and Rails modes.
    • byroot3 天前
      > I should point out that other Oj modes perform much better than the JSON gem

      Which modes are that? https://github.com/ohler55/oj/blob/develop/pages/Modes.md#oj...

      I tried:

          Oj.dump(obj, mode: :strict)
      
      and a few others and none seemed faster than `json 2.9.1` on the benchmarks I use.

      Edit:

      Also most of these mode simply aren't correct in my opinion:

          >> Oj.dump(999.9999999999999, { mode: :compat })
          => "999.9999999999999"
          >> Oj.dump(999.9999999999999, { mode: :strict })
          => "1000"
      • peterohler3 天前
        Using the benchmarks in the Oj test directory Oj has a slight advantage over the core json for dumping but not enough to make much difference. The comparison for Oj strict parsing compared to the core json is more substantial as 1.37 times faster. The benchmarks use a hash of mixed types included some nested elements.

        The callback parsers (Saj and Scp) also show a performance advantage as does the most recent Oj::Parser.

        As for the dumping of floats that are at the edge of precision (16 places), Oj does round to to 15 places if the last 4 of a 16 digit float is "0001" or "9999" if the float precision is not set to zero. That is intentional. If that is not the desired behavior and the Ruby conversion is preferred then setting the float precision to zero will not round. You picked the wrong options for your example.

        I would like to say that the core json has a come a very long way since Oj was created and is now outstanding. If the JSON gem had started out where it is now I doubt I would have bothered writing Oj.

        • byroot3 天前
          > Using the benchmarks in the Oj test directory

          I'm sorry, but I've looked for a while now, and I can't seem to identify the benchmark you are mentioning. I suspect it's the one John took for his benchmark suite? [0]

          > Oj has a slight advantage over the core json for dumping but not enough to make much difference

          I'd be curious to see which benchmark you are using, because on the various ones included in ruby/json, Oj is slightly slower on about all of them: https://gist.github.com/byroot/b13d78e37b5c0ac88031dff763b3b..., except for scanning strings with lots of multi-byte characters, but I have a branch I need to finish that should fix that.

          > The comparison for Oj strict parsing compared to the core json is more substantial as 1.37 times faster

          Here too I'd be curious to see your benchmark suite because that doesn't match mine: https://gist.github.com/byroot/dd4d4391d45307a47446addeb7774...

          > The callback parsers (Saj and Scp) also show a performance advantage as does the most recent Oj::Parser.

          Yeah, callback parsing isn't something I plan to support, at least not for now. As for Oj::Parser, `ruby/json` got quite close to it, but then @tenderlove pointed to me that the API I was trying to match wasn't thread safe, hence it wasn't a fair comparison, so now I still bench against it, but with a new instance every time: https://github.com/ruby/json/pull/703.

          > You picked the wrong options for you example.

          No, I picked them deliberately. That's the sort of behavior users don't expect and can be bitten by. As a matter of fact, I discovered this behavior because one of the benchmark payloads (canada.json) doesn't roundtrip cleanly with Oj's default mode, that's why I benchmark against the `:compat` mode. IMO truncating data for speed isn't an acceptable default config.

          [0] https://github.com/jhawthorn/rapidjson-ruby/blob/518818e6768...

          • peterohler3 天前
            The strict mode benchmarks for Oj are in the test/perf_strict.rb. Others are are in perf_*.rb.

            If callback parsing is not supported that's fine. Oj does support callback parsing as it allows elements in a JSON to be ignored. That save memory, GC, and performance. Your choice of course just as including callback parsers is a choice for Oj.

            Ok, so you picked options that you knew would fail. Again you choice but there are certainly others that would trade a slight improvement in performance to not have 16+ significant digits. It's a choice. You are certainly entitled to you opinion but that doesn't mean everyone will share them.

            I'm not sure what platform you are testing on but i'm sure there will be variations depending on the OS and the hardware. I tested on MacOS M1.

            • byroot3 天前
              > If callback parsing is not supported that's fine.

              Yes, as mentioned in part 1 of the series, my goal for ruby/json, given it is part of Ruby's stdlib, is to be good enough so that the vast majority of users don't need to look elsewhere, but it isn't to support every possible use case or to make a specific gem obsolete. For the minority of users that need things like event parsing, they can reach to Oj.

              > but that doesn't mean everyone will share them.

              Of course. When I was a fairly junior developer, I heard someone say: "Performance should take a backseat to correctness", and that still resonate with me. That's why I wouldn't consider such truncation as a default.

              > i'm sure there will be variations depending on the OS and the hardware. I tested on MacOS M1.

              I suspect so too. I'd like to get my hands on a x86_64/Linux machine to make sure performance is comparable there, but I haven't come to it yet. All my comparisons for now have been on M3/macOS.

              > It looks like a lot of time and effort went into the analysis.

              It was roughly two weeks full time, minus some bug fixes and such. I think in the end I'll have spent more time writing the blog series than on the actual project, but that probably says more about my writing skill :p

              Anyway, thanks for the pointers, I'll have a look to see if there's some more performance that need to be squeezed.

              • peterohler3 天前
                If you would like to discuss separately on a call or chats I'd be up for that. Maybe kick around a few ideas.
          • peterohler3 天前
            I missed responding to your assertion that the Oj::Parser was not thread safe. An individual Oj::Parser instance is not thread safe just like other Ruby object such as a Hash but multiple Oj::Parser instances can be created in as many threads as desired. The reason each individual Oj::Parser is not thread safe is that it stores the parser state.
            • byroot3 天前
              Yes that's what I meant. The benchmark suite I took from rapidjson was benchmarking against:

                  Oj::Parser.usual.parse(string)
              
              That is what isn't thread safe. And yes you can implement a parser pool, or simply so something like:

                 parser = (Thread.current[:my_parser] ||= Oj::Parser.new(:usual))
              
              But that didn't really feel right for a benchmark suite, because of the many different ways you could implement that in a real world app. So it's unclear what the real world overhead would be to make this API usable in a given application.

              > is that it stores the parser state.

              And also a bunch of parsing caches, which makes it perform very well when parsing the same document over and over, or documents with a similar structure, but not as well when parsing many different documents. But I'll touch on that in a future post when I start talking about the parsing side.

          • peterohler3 天前
            Just so you know, I am impressed by the depth you've delved into with JSON parsing and dumping. It looks like a lot of time and effort went into the analysis.
            • byroot3 天前
              Ah, I figured why on the Oj side `ruby/json` appeared slower: https://github.com/ohler55/oj/pull/949
              • peterohler3 天前
                Merged. Didn't seem to make much difference though. Results for the original Oj parser are pretty close to the core json now. I'll have to update the README for Oj. It's a bit stale. The new Oj::Parser is still much faster if not restricted to the current Rails environment.
      • Twirrim3 天前
        Out of curiosity, I'm looking at the JSON spec. This mildly horrifies me: "This specification allows implementations to set limits on the range and precision of numbers accepted."

        The spec doesn't specify a precision or range limit anywhere (just suggests that IEEE754 might be a reasonable target for interoperability, but that supports up to 64bit floats, and it looks like Oj is dropping to 32bit floats?).

        Python and Go don't go and change the precision of floating point numbers in their implementations, but according to the standard, they're entirely entitled to, and so is Oj.

        I don't see anything in https://github.com/ohler55/oj/blob/develop/pages/Modes.md#oj... specifying that Strict will force floating points to specific precision vs other implementations

        • byroot3 天前
          Yes, JSON as a format is very much under specified, a lot of these sorts of things are basically implementation defined.

          In general libraries do what make sense in the context of their host language, or sometimes what makes sense in the context of JavaScript.

          For ruby/json, I consider that if something can be rountriped, from Ruby to JSON and back, it should be, which means not reducing float precision, nor integer precision, e.g.

              >> JSON.generate(2**128)
              => "340282366920938463463374607431768211456"
          
          But other libraries may consider that JSON implies JavaScript, hence the lack of big integer, so such number should be dumped as a JS decimal string or as a floating point number.

          > I don't see anything in [...] specifying that Strict will force floating points to specific precision vs other implementations

          Yes, and that's my problem with it. As you said, Oj is free to do so by the JSON spec, but I'd bet 99% of users don't know it does that, and some of them may have had data truncation in production without realizing it.

          So in term of matching other libraries performance, If another library is significantly faster on a given benchmark, I treat it as a bug, unless it's the result of the alternative trading what I consider correctness for speed.

  • danielbln3 天前
    Hackernews has a surprising amount of Ruby news over the large few months. As Ruby is my first true (=production) love, I'm here for it.

    I've spent the last years in Python land, recently heavily LLM assisted, but I'm itching to do something with Ruby (and or Rails) again.

    • Alifatisk3 天前
      I've also noticed this and honestly, it kinda excites me seeing it gaining more attention again. I am really curious to the reason, my assumption has been because Ruby is going through some kind of renaissance. The performance improvements the Ruby team & Shopify did seem to have gained attraction maybe?
    • aarmenaa3 天前
      Maybe some people don't remember anymore, but there was a time when Ruby was HN's favorite language. I miss those days. I kind of get why everybody leaned into Python instead, but I'm never going to be happy about it.
  • echelon4 天前
    I haven't seen this many Ruby posts on HN since 2012.

    We've had a few months of pretty regular Ruby posts now, and the last week has had one almost every single day.

    I'm not a regular Rubyist, but I'm glad to see the language getting more attention.

    • mberning4 天前
      Agree. I am ready for a Ruby renaissance that is not so heavily focused on rails. I have built quite a few standalone utilities in ruby over the years and it’s a joy compared many other languages and ecosystems.
      • I wonder what prompted this renaissance. Always loved the language, never cared much about Rails. For a long time it seemed like Ruby was never going to catch up to Python and JavaScript which are the default languages everybody uses. Now every other day I see some awesome Ruby news about how it's getting better and faster. Not that I'm not complaining.
        • kenhwang3 天前
          Seems like the next generation of programmers has relearned the lessons of poor package management and ecosystem fragmentation, and that it wasn't worth giving up for a slightly simpler or slightly faster language.

          Also, ruby did get a lot faster in the last couple years, which inspires people to want to help make it even faster. When someone finds gold, everyone else rushes in to look for more.

        • chris123213 天前
          I think it's a combination of a lot of things over the last 2ish years:

          1. Ruby 3.0 and YJIT have provided huge performance gains for the language with further improvements still left to be implemented.

          2. Ruby releases new versions every year on Christmas day, so you're more likely to get new content around this time of year.

          2. Large Rails shops like Github and Shopify have redoubled their commitment to Ruby/Rails and invested a lot of resources into improving the developer experience with ruby-lsp.

          3. Prism, the new Ruby parser has been developed and merged into Ruby, from my understanding, it's a lot more user-friendly and fault-tolerant, allowing for the creation of more/better development tools.

          4. Rails 7/8 released a ton of exciting new features such as Hotwire, Solid suite, auth generation and others. Promising a simpler way to make high-fidelity applications.

          5. The Rails Foundation was created and has focused on improving documentation, organising Rails World and pushing the message of Rails being 'a one person framework' that can get you 'from hello world to IPO'.

          6. A growing dissatisfaction with the needless complexity of the Javascript ecosystem and cloud providers, pushing people towards the simple but powerful solutions Rails provides.

          All these individual contributions seem to have produced a snowball effect. As a long-time Rails developer, seeing there be a new Ruby and/or Rails post on the front page of HN nearly every day recently has been really exciting.

  • anitil4 天前
    This was really enjoyable to read - I really enjoy this kind of in-the-weeds optimisation and the author explains it all really well. I was surprised at how much Oj was willing to put on the stack! But my background is embedded and so large stack allocations have ruined my day more than once
  • kazinator3 天前
    It's pretty lame that ISO C doesn't provide integer-to-string conversion that does not go through a printf-family formatter, so that programs are still rolling their own "itoa" as the calendar turns to 2025.

    Format strings are compilable in principle, so that:

      snprintf(buf, sizeof buf, "%ld", long_value);
    
    can just turn into some compiler-specific run-time function. The compiler also can tell when the buffer is obviously large enough to hold any possible value, and use a function that doesn't need the size.

    How common is that, though?

    Common Lisp's format function can accept a function instead of a format string. The arguments are passed to that function and it is assumed to do the job:

      (format t (lambda (...) ...) args ...)
    
    There is a macro called formatter which takes a format string, and compiles it to such a function.

      [8]> (format t "~1,05f" pi)
      3.14159
      NIL
      [9]> (format t "~10,5f" pi)
         3.14159
      NIL
      [10]> (format t (formatter "~10,5f") pi)
         3.14159
      NIL
      [11]> (macroexpand '(formatter "~10,5F"))
      #'(LAMBDA (STREAM #:ARG3345 &REST #:ARGS3342) (DECLARE (IGNORABLE STREAM))
         (LET ((SYSTEM::*FORMAT-CS* NIL)) (DECLARE (IGNORABLE #:ARG3345 #:ARGS3342))
          (SYSTEM::DO-FORMAT-FIXED-FLOAT STREAM NIL NIL 10 5 NIL NIL NIL #:ARG3345) #:ARGS3342)) ;
      T
    
    In this implementation, formatter takes "~10,5f" and spins it into a (system::do-format-fixed-float ...) call where the field width and precision arguments are constants. Just the stream and numeric argument are passed in, along with a bunch of other arguments that are defaulted to nil.

    I think CL implementations are allowed to apply formatter implicitly, which would make sense at least in code compiled for speed.

    Just think: this stuff existed before there was a GNU C compiler. It was a huge progress when it started diagnosing mismatches between format strings literal and printf arguments.

  • Joker_vD3 天前
    Man, if I had a nickel every time I saw an itoa/ltoa/lltoa implementation that doesn't work on the most negative number I'd have about $10 or so, I think.

    The annyoing thing about it is that all the workarounds I know about are really ain't that pretty:

    1. You can hard-code the check against it and return a hardcoded string representation of it:

        if (number == -9223372036854775808) return "-9223372036854775808";
    
    By the way, "(number && (number == -number))" condition doesn't work so don't try to be too smart about it: just compare against INT_MIN/LONG_MIN/etc.

    2. You can widen the numeric type, and do the conversion in the larger integer width, but it doesn't really work for intmax_t and it's, of course, is slower. Alternatively, you can perform only the first iteration in the widened arithmetic, and do the rest in the original width, but this leads to some code duplication.

    2a. You can do

        unsigned unumber = number;
        if (number < 0) unumber = -unumber;
    
    and convert the unsigned number instead. Again, you can chose to do only the first iteration in the unsigned, on platforms where unsigned multiplication/division is slower than signed ones. Oh, and again, beware that "unsigned unumber = number < 0 ? -number : number" way of conversion doesn't work.

    3. You can, instead of turning the negative numbers into positive ones and working with the positive numbers, do the opposite: turn positive numbers into negative ones and work exclusively with the negative numbers. Such conversion is safe, and division in C is required to truncate to zero, so it all works out fine except for the fact that the remainders will be negative; you'll have to deal with that.

    But yeah, converting integers into strings is surprisingly slow; not as slow as converting floats, but still very noticeable. Maybe BCDs weren't such a silly idea, after all...

    • nick__m3 天前
      BCD for money like datatype makes a lot's of sense but there a strange lack of support for it in x64 (while the support in x86 was so anemic as to be almost useless). Easy to implement correct low-level financial math operations seems to be reserved to the mainframe!

      Does anyone know why Intel publish a DFP (decimal floating point) library instead of pushing those instructions down to the microcode level like the mainframes do ?

      • adgjlsfhk13 天前
        probably because in the GHz era none of this has to be fast anyway. financial programs will be spending all their time waiting for disk or network anyway. it's pretty hard to imagine a program that needs to care about decimal floating point in finance that would be slow at a human scale as a result of doing this in software
      • KerrAvon3 天前
        I don’t specifically know Intel’s thinking, but the usual conclusion from CPU designers is that decimal instructions are so rarely used that it’s not worth allocating silicon budget to them.
  • meisel3 天前
    Would allocating a 640-byte string initially really be the right tradeoff? It seems like it could result in a lot of memory overhead if many small strings are created that don’t need that space. But it does save a copy at least

    As for the int-to-string function, using the division result to do a faster modulus (eg with the div function) and possibly a lookup table seem like they’d help (there must be some good open source libraries focused on this to look at).

    • byroot3 天前
      > Would allocating a 640-byte string initially really be the right tradeoff?

      It depends, presumably the generated JSON string would quickly be written down inside something else (e.g. sent as HTTP response or saved in database), so the object slot would be freed rather quickly.

  • Lammy4 天前
    This makes me want to re-compare REXML and ox (from the same author as oj) which I use heavily.
  • f33d51733 天前
    > Why ato? Because C doesn’t really have strings, but arrays of bytes, hence “array to int” -> atoi.

    The lore I was familiar with was that a stood for ascii.

  • akdas4 天前
    I recommend reading the previous 3 parts too, plus I'm looking forward to the next parts. I love that it goes into details and very clearly explains the problems and solutions, at least if you're familiar with C and know some things about compiler implementations.
  • thiago_fm3 天前
    Really enjoyed the new article, as always!

    Maybe at the end, he should have shown the two profiles again for comparison :D

  • 4 天前
    undefined
  • dudeinjapan4 天前
    Whats the reason for not merging/using Oj as the Ruby core JSON library?
  • cannibalXxx3 天前
    If you're just starting out in this language, here's a list that will help you with that [0] https://chat-to.dev/post?id=453, [1] https://chat-to.dev/post?id=457 [2] https://chat-to.dev/post?id=565