Opened 16 years ago
Closed 13 years ago
#10793 closed Bug (fixed)
Sitemap implementation means it cannot be updated automatically
Reported by: | tenkujin | Owned by: | Grzegorz Nosek |
---|---|---|---|
Component: | contrib.sitemaps | Version: | 1.4-alpha-1 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The Sitemap class uses an attribute _paginator which stores all the URLs. The method items() is thus called only once for the creation of the paginator, and subsequent invocations of the sitemap view do not call it. If the items() method is meant to return dynamically changing items (which is a common case), changes are not reflected in the sitemap. The only way to fix this is to either restart Django (not acceptable), override the get_urls() method and update the _paginator (not nice), or re-implement the Sitemap object to not have such semantics (violates DRY).
Since this paginator is a hack to accommodate Google's arbitrary 50k URL limit, I think it would be best to have a Sitemap which can be updated at runtime, and an optional GoogleSitemap for this particular search engine.
Attachments (4)
Change History (19)
comment:1 by , 16 years ago
follow-up: 3 comment:2 by , 16 years ago
The 50k limit is not arbitrary. Google, like MSN and Yahoo, use the sitemaps.org standard which specifies the 50k limit.
See http://sitemaps.org/protocol.php#index for more Information.
comment:3 by , 16 years ago
Replying to arthurk:
The 50k limit is not arbitrary. Google, like MSN and Yahoo, use the sitemaps.org standard which specifies the 50k limit.
See http://sitemaps.org/protocol.php#index for more Information.
Still, by storing the paginator in the self._paginator field and not providing a way to clear it (signals? cache? method?) means we need to fiddle with the internals of Django sitemaps simply to have dynamic sitemaps. I think the pagination logic should somehow check to see whether the query set it uses has changed since the last time the paginator was created and assigned to _paginator
comment:4 by , 15 years ago
Here's a solution to the problem, which I'm currently using on my website.
I applied the above patch to django 1.1 and I modified urls.py to look like this:
urlpatterns = patterns( '', url(r'^sitemap.xml$', 'django.contrib.sitemaps.views.index', {'sitemaps': get_sitemaps}, name='sitemap_index'), url(r'^sitemap-(?P<section>.+)\.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': get_sitemaps}, name='sitemap'), )
where get_sitemaps is a function which returns the sitemaps directory.
def get_sitemaps(): sitemaps = {} sitemaps['blogs'] = BlogSitemap ...... return sitemaps
Using this method my sitemaps are instanciated on every request, and I get a new paginator, and expected results.
comment:5 by , 15 years ago
The approach in sitemaps.patch invalidates the current documentation, and doesn't take into consideration the fact that an previously instantiated Sitemap instance may be used instead of a subclass. I don't know if the pagination should really be handled by the Sitemap class at all, since it seems like a more view specific attribute. Since ditching the class based pagination would require a lot of view changes, I've been using a subclass of Sitemap whose paginator attribute is instantiated each time it is accessed. When using the current sitemap views, the paginator will only be accessed once, so only one paginator is initialized for each request (luckily).
For reference, I believe this bug only shows up if you add a Sitemap instance to the sitemaps dict. Using classes only shouldn't be a problem since the short-lived instance will be recreated each time the view is called.
Should we dumb down Sitemap and move pagination into the views? It seems like that would be more flexible.
comment:6 by , 15 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:7 by , 14 years ago
Component: | Contrib apps → contrib.sitemaps |
---|
comment:8 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
by , 13 years ago
Attachment: | paginator.diff added |
---|
Dynamic paginator creation, checking for DB changes and time limit
comment:9 by , 13 years ago
Easy pickings: | unset |
---|---|
Has patch: | set |
UI/UX: | unset |
We check time to life of paginator. If it is older than 10 minutes (Sitemap.reload_delta), we force recreation of new paginator. If it is older than 1 minute (Sitemap.counter_delta) we check whether number of items in DB has changed and recreate paginator if so.
comment:10 by , 13 years ago
Version: | 1.0 → 1.4-alpha-1 |
---|
by , 13 years ago
Attachment: | paginator_doc.diff added |
---|
Added documentation to previous patch, moved checking code to separate method.
comment:11 by , 13 years ago
Needs tests: | set |
---|
comment:12 by , 13 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
by , 13 years ago
Attachment: | 10793.diff added |
---|
comment:13 by , 13 years ago
Needs tests: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:14 by , 13 years ago
The Paginator object is needlessly cached for the whole lifetime of the Sitemap object, which may be the whole lifetime of the server.
After discussing with jezdez, we considered the potential performance impact (evaluating len(sitemap.items()
) at every request acceptable for the time being. Fixing this properly would require Paginators to lazily determine the number of pages only when actually needed, which is quite outside of the scope of this ticket.
"subsequent invocations of the sitemap view do not call it"
Are you sure? AFAIK Django treats every request as a completely new one.