[jira] [Created] (BATIK-1183) Performance of <use> and <symbol>

JIRA jira@apache.org
Erich Schubert created BATIK-1183:

             Summary: Performance of <use> and <symbol>
                 Key: BATIK-1183
                 URL: https://issues.apache.org/jira/browse/BATIK-1183
             Project: Batik
          Issue Type: Improvement
          Components: Bridge
    Affects Versions: trunk
            Reporter: Erich Schubert
         Attachments: scatter.svg.gz

In ELKI, we use Batik for Scatterplots.
Marker symbols are generated as <symbol> tag, and then a <use> at the individual locations. This is nice for post-editing (because the symbols can be changed in a single place), but performance of this approach is pretty bad (up to the point where I am considering to kick out Batik, and try something else).

When analyzing performance bottlenecks, I noticed the following things:
1. A substantial amount of time (way too much) goes into listener list management (yes, I want support for dynamic changes; so I do need listeners). It seems that for every <use>, several listeners are added?
2. String.intern is a major performance factor. I understand that we need to intern strings, but we need to avoid redoing it as often.
3. When a <symbol> is used, it gets cloned. With thousands of <use> tags, this leads to a substantial cost. In particular, because every string will be interned again for every usage.
(org.apache.batik.bridge.SVGUseElementBridge#buildCompositeGraphicsNode calls 'importNode')

I have tried to improve some of these things in my speedup branch:

In particular, SVGConstants.SVG_NAMESPACE_URI is recognized and not interned; as we expect to see this namespace very often; and I replaced the listener list management with something much simpler (and more efficient, as some of the functionality wasn't ever used).
I could not tackle the amount of listeners and the cloning, as I am not deep enough into Batik internals.

