Opened 17 years ago
Closed 17 years ago
#6965 closed (fixed)
urlize should be faster
Reported by: | floguy | Owned by: | Andrew Badr |
---|---|---|---|
Component: | Template system | Version: | dev |
Severity: | Keywords: | ||
Cc: | floguy@…, andrew@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Running the urlize or urlizetrunc filter on some text that is very large, or running it many times seems to slow down the system more than it should.
Attachments (2)
Change History (14)
comment:1 by , 17 years ago
comment:2 by , 17 years ago
Cc: | added |
---|---|
Has patch: | set |
Owner: | changed from | to
Status: | new → assigned |
by , 17 years ago
Attachment: | faster_urlize.diff added |
---|
Improved speed to about 50x faster. Tests still pass.
comment:3 by , 17 years ago
From a quick skim of your code, it seems like you're changing how the escaping works - what if a URL has an ampersand in it? Potentially couldn't you break the HTML if it trimmed to a limit half way in an &?
comment:4 by , 17 years ago
Patch needs improvement: | set |
---|
Great point! I'm going to have to think about how to overcome that, since it'll probably foil this whole current scheme.
comment:5 by , 17 years ago
floguy, are you still working on this? Otherise, I can take a stab at it.
comment:6 by , 17 years ago
No, go ahead. I implemented it another way but it made the code almost as slow as the old version :-(
comment:7 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:8 by , 17 years ago
Owner: | removed |
---|
I gave my shot at this these last couple days, but the overhead seems to largely be in the regular expressions. punctuation_re especially seems clunky. I, however, don't know enough about regular expressions to optimize them.
comment:9 by , 17 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:10 by , 17 years ago
Cc: | added |
---|---|
Component: | Uncategorized → Template system |
Patch needs improvement: | unset |
I created a patch that skips the regex match for most words. This resulted in 10x speed improvement on a test set of real posts. The reason for checking for '@' and ':' on line 98 is because I don't want to change the behavior for e.g. http://localhost/ even though it's not clear whether this should be supported or not (I don't think so). The len call was removed for good measure.
comment:11 by , 17 years ago
Triage Stage: | Unreviewed → Ready for checkin |
---|
Patch looks good. Existing tests will be enough so I it's ready to rock.
comment:12 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I'd be +1 on no email-to-mailto by default and adding an extra filter
urlize_with_emails
that supports email-to-mailto.