Opened 3 years ago

Last modified 3 years ago

#33465 closed Cleanup/optimization

Introduce empty __slots__ protocol for SafeString & SafeData — at Initial Version

Reported by: Keryn Knight Owned by: Keryn Knight
Component: Utilities Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This is a case-by-case proposal ultimately referencing #12826

Because SafeString is used a lot and is otherwise supposed to be treatable as a untainted str we should be able to (AFAIK) update it + it's inheritance chain to use __slots__ = () whilst still allowing custom subclasses of either to add additional attributes. By defining __slots__ as empty on SafeString (and SafeData) we'd avoid creation of a __dict__ on the instance, which mirrors the str() behaviour.

According to pympler, currently in Python 3.10 using the following back of the napkins strings:

In [4]: s = "test"  # this might be interned, as a short string?

In [5]: s2 = "test" * 100

In [6]: s3 = SafeString("test")

In [7]: s4 = SafeString("test" * 100)

we get:

In [8]: asizeof(s)  # str
Out[8]: 56

In [9]: asizeof(s2)  # str
Out[9]: 456

In [10]: asizeof(s3)  # SafeString
Out[10]: 208

In [11]: asizeof(s4)  # SafeString
Out[11]: 608

But if we swap out the implementation to be slots'd, it looks more like:

In [8]: asizeof(s)  # str
Out[8]: 56

In [9]: asizeof(s2)  # str
Out[9]: 456

In [10]: asizeof(s3)  # SafeString
Out[10]: 104

In [11]: asizeof(s4)  # SafeString
Out[11]: 504

So we're "saving" 104 bytes per SafeString created, by the look of it. I presume it to be some fun implementation detail of something somewhere that it is allegedly accounting for more than 56 bytes, which is the asizeof({})

A quick and dirty check over the test suite suggests that for me locally, running 14951 tests in 512.912s accounted for 949.0 MB of SafeStrings, checked by just incrementing a global integer of bytes (using SafeString.__new__ and --parallel=1) and piping that to filesizeformat, so y'know, room for error.
After the patch, the same tests accounted for 779.4 MB of SafeString, "saving" 170 MB overall.

The only functionality this would preclude -- as far as I know -- is no longer being able to bind arbitrary values to an instance like so:

s = SafeString('test')
s.test = 1

which would raise AttributeError if __slots__ were added, just like trying to assign attributes to str() directly does.

I don't believe this will have any marked performance change, as neither SafeString nor SafeData actually have any extra attributes, only methods.

I have a branch which implements this, and tests pass for me locally.

Change History (0)

Note: See TracTickets for help on using tickets.
Back to Top