Opened 14 years ago

Closed 14 years ago

#16314 closed Bug (invalid)

FileSystemStorage.listdir returns names with unicode normalization form that is different from names in database

Reported by: philomat Owned by: nobody
Component: File uploads/storage Version: 1.3
Severity: Normal Keywords: storage unicode normalization
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When you want to write a function that finds files on disk that are not stored in the database anymore, and use FileSystemStorage.listdir to compare what's returned with what's in the database: You will not be able to compare strings without normalizing them first since unicode characters can be encoded using different normalization forms.

This problem is best demonstrated with some example code:

# Assuming that my storage root contains one folder named u'ä'

import os
from django.core.files.storage import FileSystemStorage
import unicodedata

# listdir returns u'a' followed by 'COMBINING DIAERESIS' (U+0308)

FileSystemStorage().listdir()[0][0]

u'a\u0308'
# in the database, this character is stored using a different normalization form:

os.path.basename(FileSystemStorage().path(u'ä'))

u'\xe4'
# the values should be normalized:

unicodedata.normalize('NFC', FileSystemStorage().listdir()[0][0])

u'\xe4'

Change History (1)

comment:1 by philomat, 14 years ago

Resolution: invalid
Status: newclosed
Note: See TracTickets for help on using tickets.
Back to Top