#11663 closed Uncategorized (wontfix)
Delete orphaned replaced files
Reported by: | Chris Beaven | Owned by: | Chris Beaven |
---|---|---|---|
Component: | File uploads/storage | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Johannes Bornhold | Triage Stage: | Design decision needed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When using a model instance with a FileField
(or subclass), you can update the image by setting a new file as an instance matching the field. This leaves the potential of an orphaned file.
A common case is that you have a ModelForm
(bound to an existing instance) for which you use to upload a new file:
- The form is saved in your view which overwrites the linked
FieldFile
with a new one (which will be based on the data from theUploadedFile
). - Now there is no reference to the old file, so even custom storage can't touch it (unless you use an
upload_to
callable which names it the same and your custom storage is changed to dangerously blindly overwrite files with the same name).
Note, the delete orphan logic already exists - FileField
attaches a post_delete signal to it's Model class which does this.
This is obviously a change in functionality. To preserve backwards compatibility, I recommend that this is a new argument on FileField
(off by default).
Attachments (3)
Change History (25)
by , 15 years ago
Attachment: | 11663.diff added |
---|
comment:1 by , 15 years ago
Needs documentation: | set |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:2 by , 15 years ago
On the two prior tickets where Chris has been pushing this, there were notes indicating that this behavior was easy to achieve with the file-storage refactor that went into Django 1.0. Unfortunately, what I had been discussing was simply the ability to overwrite existing files easily enough. What Chris -- and many others -- are asking for here is the ability to make that decision on a per-instance basis, which currently isn't easy.
To clarify the comment about "dangerously" blindly overwriting files, Chris is referring to the situation where one user uploads a file and another uploads another with the same name, overwriting the first user's file. That was my suggestion in the past, and I do have to agree with Chris that it was a naive approach that has some pretty significant downsides.
comment:3 by , 15 years ago
I know naming things is hard in computer science, but come on... safe_delete_replaced
?! Implies there is also dangerous_delete_replaced
. All the usages of safe_
prefixes in this patch make my brain hurt. Either we're deleting or we're not. It's neither safe nor dangerous, so let's try to avoid confusing things with non-useful information in the parameter name.
Why does the save()
method needs an extra parameter? Deleting is a property of the field itself (if you only want to delete sometimes, it's pretty easy already to override the save()
method). I'm still fairly under-enthused by this idea, but at least let's make the patch correct so that we can see what's being evaluated.
by , 15 years ago
Attachment: | 11663.2.diff added |
---|
comment:4 by , 15 years ago
Thanks for your initial comments, Malcolm. I've uploaded a new version addressing these.
- "safe" implied only deleting orphans. But you are correct, it wasn't a very good term.
- This went through several iterations -
save()
doesn't need an extra property now and this has been removed.
I'm a bit saddened as to lack of enthusiasm for inclusion.
comment:5 by , 15 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:6 by , 15 years ago
Just two quick comments from a luser: I fail to see the rationale in leaving orphaned files around, why would you make this configurable and turned off instead of just assuming it to be default behaviour? (On a similar note, I fail to see how two different objects can point to the same file unless someone has hacked the field paths, in which case they asked for it. So the database lookup appears unnecessary. But I guess it's just a carry over. By the way, does this field install an index? Otherwise, what happens when you try to delete user profiles with images on a site with 1.000.000 users?)
The other comment is that this still doesn't not fix issue 4339 since there is no easy knob for telling Django "this filename, and I mean it, no underscores" for fields where you do not use the original filename but auto-generate one (say based on DB id).
But it's great to see you're working on it. :)
comment:7 by , 15 years ago
It is currently optional because it is a change in functionality. Currently, files aren't touched if the model is updated (they are already removed if the model is deleted). A site may be relying on this behaviour. I didn't want to rock the boat more than possible.
But let's leave design discussion on the Django Developers google group. Add a comment here with a link to discussion thread.
Regarding #4339, if that's what it was after, then it's a straight dupe of the original meaning of #2983, which was closed as a wontfix (as this is possible by using custom FileStorage). Again, if you contest this then bring it up in the google group.
comment:8 by , 15 years ago
Okaypokey. Here you go:
http://groups.google.com/group/django-developers/browse_thread/thread/491619541ba6ac75
comment:9 by , 14 years ago
Cc: | added |
---|
Is there any progress on this ticket?
I was just hit by the behavior of FileField, and did notice it only notice it by accident. So if there are good reasons not to change the API, it may be a good thing to add a note or a warning note to the documentation of FileField, so users of FileField get a real chance to notice this behavior.
This just would have saved me some hours of googling around, so I am willing to prepare a patch for the docs if it is appreciated, this patch should be easier to integrate (no side effects in functionality and no API change) until a design decision will be done in the far future. ;-)
follow-up: 16 comment:10 by , 14 years ago
Hello this problem was bothering me A LOT.
I just added a custom save method for my model and it seems to be working fine though:
def save(self, *args, **kwargs): try: this = MyModelName.objects.get(id=self.id) if this.MyImageFieldName != self.MyImageFieldName: this.MyImageFieldName.delete() except: pass super(MyModelName, self).save(*args, **kwargs)
Hopefully this helps anyone looking for a quick work around, let me know if it doesn't work.
comment:11 by , 14 years ago
Needs documentation: | unset |
---|
comment:12 by , 14 years ago
IMO when you just need to update existing file, Django should be able to do this (with a flag like reuse_file=True) instead of creating new and then deleting old file. To achieve this I use following workaround for unfortunately missed base functionality.
import os from base64 import b32encode from django.db.models import FileField from django.db.models.fields.files import FileDescriptor def generate_random_filename(request_filename, default_ext=''): ext = os.path.splitext(request_filename)[1] or default_ext # base32 produces url-friendly strings, but shorter than hex. rnd_string = b32encode(os.urandom(settings.SAFE_IMAGE_BYTE_LENGTH)) filename = rnd_string.lower() + ext return filename def _initial_field_name(field_name): return '_initial_%s_name' % field_name class StableFileDescriptor(FileDescriptor): """ The descriptor allows to remember original filename of a FileField. """ def __set__(self, instance, value): if isinstance(value, basestring): # filename of current file loaded from database instance.__dict__[_initial_field_name(self.field.name)] = value super(StableFileDescriptor, self).__set__(instance, value) class StableFileField(FileField): descriptor_class = StableFileDescriptor def generate_filename(self, instance, request_filename): filename = getattr(instance, _initial_field_name(self.name), None) if not filename: filename = generate_random_filename(request_filename) return super(StableFileField, self).generate_filename(instance, filename)
comment:13 by , 14 years ago
Resolution: | → wontfix |
---|---|
Status: | assigned → closed |
Talking with malcolm, this has about a 0.01% chance of making it into django. On that note, I'm wontfixing the issue.
Oh well, I did learn a lot about the internals :)
comment:14 by , 14 years ago
For the record (since I just got a Trac collision whilst also wontfix'ing this), my logic is: Django shouldn't be in the business of deleting files, which is more of a system-level task. We have no way of knowing what else on the system is using those files. If we *don't* delete a file and the user wants to remove it, they can use the os module. If we *do* delete a file and that wasn't the right thing (from the point of view of something else on the system), the user is screwed. We cannot guarantee that removal is correct, so we shouldn't do it.
Writing a utility function to check for orphaned files and clean them up is a simple enough operation. Whether to include it in Django or not is debateable (it's kind of like the cleanup sessions management command, I guess) and the topic for another ticket. Doing it automatically feels too dangerous to be correct at any time.
comment:15 by , 14 years ago
People who need the cleanup command that Malcolm mentions will perhaps find https://github.com/mrts/django-commands/blob/devel/django_commands/management/commands/file_cleanup.py useful.
comment:16 by , 13 years ago
Easy pickings: | unset |
---|---|
Severity: | → Normal |
Type: | → Uncategorized |
UI/UX: | unset |
Hi,
The solution by metamemetics works perfectly for me!
Thank you
comment:17 by , 13 years ago
Having recently had this problem on my application, I will risk the wrath of the community in adding onto an already over-discussed, and perhaps a dead thread.
Malcolm's concerns regarding the deletion of files is valid, and honestly, I would think twice about deleting any file from my server (I am speaking from personal experience). However that said, I think that it is important to realize that more often than not, the files that are deleted are unimportant and a useless waste of disk space that many of us are paying for.
I think that as "The Web framework for perfectionists with deadlines" it is important that Django makes the development process quicker by allowing us developers to simply toggle one or more field options for file and image uploads rather than having us set up cron jobs or the like to deal with the orphaned files. It may be simple, but it takes time, and makes the process much more tedious.
Also to lay some of other Malcolm's fears to rest, I do not think that any developer worth any amount of programming talent would blame Django if an important file was deleted. The blame will lay solely on the developers inability to keep mission critical files separate from not as important ones.
Please do consider applying this as a fix to a future Django release.
comment:18 by , 13 years ago
Django cannot know whether the file is used somewhere else, but sometimes *the developer* does know this for certain and wants to delete old files. There should be an easy way to do it (say, by toggling a parameter after reading the documentation for some seconds) because there are enough interested users who are currently spending hours with this issue.
The proposed half-baked workarounds are not trivial enough; none worked for me at the first try.
I propose having FileField(upload_to='sw',replace=True), where replace=False would be the default.
comment:19 by , 13 years ago
+1 for beeing able to do something like this:
FileField(upload_to='sw',replace=True)
@DanielClemente:
The only "quick and dirty workaround" i found for this is: (see #4339 )
from django.core.files.storage import FileSystemStorage class OverwriteStorage(FileSystemStorage): def _save(self, name, content): if self.exists(name): self.delete(name) return super(OverwriteStorage, self)._save(name, content) def get_available_name(self, name): return name
comment:20 by , 13 years ago
I hate to add to the comment spam, but really, people, seeing that this is WONTFIX and that 1.3 has removed feature that a file field will delete it's associated file when the object is deleted, it's clear that Malcolm's idea of never touching the file system has won this battle.
As much as you may disagree with this, please don't comment here. It's useless and will just spam all of us. If you want to do something about this, maybe use your energy to code a new file field that does in fact clean up?
Regarding the file_cleanup.py command link posted above: there's a possible race condition there if a file ends up on disk before the corresponding object is saved in the database. If you want to make a non-racy garbage collector for this, you need to be really careful.
comment:22 by , 12 years ago
why not to compare files also by CRC together with assesing filename existence and returning name from _save is both are the same?
Patch with comprehensive tests