Opened 18 years ago

Closed 15 years ago

Last modified 15 years ago

#3924 closed (fixed)

Caught an exception while rendering: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)

Reported by: ruben.perez@… Owned by: hugo
Component: contrib.admin Version: 1.0
Severity: Keywords:
Cc: jm.bugtracking@…, antoni.aloy@…, mpjung@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Malcolm Tredinnick)

Hello, after update the django code from svn, my admin site doesn´t work. The error is : Caught an exception while rendering: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128).

My models.py has a correct coding declaration, it is # -*- coding: UTF-8 -*-. My admin site allways works with previous svn revisions but it doesn´t work with the latest revision 4926 from the developer code. I´m working with language-code "es-es".

The problem is related with the use of non-ascii characters as the return value of __str__ with models.ForeignKey and models.ManyToManyField.

Thank you very much.

Below you can find a piece of code.

UnicodeDecodeError at /admin/articulos/articulo/add/
'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)
Request Method: 	GET
Request URL: 	http://www..../admin/articulos/articulo/add/
Exception Type: 	UnicodeDecodeError
Exception Value: 	'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)
Exception Location: 	/home/rubenper/django/django/oldforms/__init__.py in render, line 490
Template error

In template /home/rubenper/django/django/contrib/admin/templates/widget/foreign.html, error at line 2
Caught an exception while rendering: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)
1 	{% load admin_modify adminmedia %}
2 	{% output_all bound_field.form_fields %}

Change History (30)

comment:1 by anonymous, 18 years ago

Resolution: invalid
Status: newclosed

comment:2 by anonymous, 18 years ago

Resolution: invalid
Status: closedreopened

Hi, I reopen this bug because the problem persists.

The problem is related with this:

http://code.djangoproject.com/changeset?new=django%404919&old=django%404918

Version 4918 works
Version 4919 doesn´t work

I often use words like "Asociación" for example, and with version 4919 or newer, non-ascii characters don´t work.

html = str(html) change to html = smart_unicode(html)

Thank You

I´m sorry but I´m a newbie and I don´t know how to resolve it.

comment:3 by Malcolm Tredinnick, 18 years ago

Description: modified (diff)

Fixed formatting in description.

comment:4 by Malcolm Tredinnick, 18 years ago

I suspect this is partly because [4919] has exposed a bug in smart_unicode(). We should be treating all str instances as UTF-8 encoded, as per PEP 263 (settings.DEFAULT_CHARSET doesn't play a role there). Before I can fix that, I need to fix form input handling so that the only bytestrings we have floating around in newforms are UTF-8 strings. That is work in progress. Then I'll make it work with oldforms.

comment:5 by Malcolm Tredinnick, 18 years ago

What is your DEFAULT_CHARSET setting in you Django settings?

If the string returned from your models' __str__ method encoded in the DEFAULT_CHARSET encoding? If not, you can't expect it to work. I suspect you are using UTF-8 internally everywhere, but can you please confirm that.

comment:6 by karsu, 18 years ago

I confirm that [4919] breaks admin interface if you are using non-ascii characters.

DEFAULT_CHARSET = utf-8

I made test case for that problem.

form_tests = r"""
>>> from django.oldforms import *

>>> CHOICES = ((0, u'\x84\x94'),)
>>> f = SelectField(field_name='test', choices=CHOICES)
>>> print f

>>> f = SelectField(field_name=u'\x84\x94', choices=None)
>>> print f
"""

comment:7 by ruben.perez@…, 18 years ago

Hi, I can confirm that my settings.DEFAULT_CHARSET is utf-8

Thank You very much for your help.

in reply to:  6 comment:8 by Malcolm Tredinnick, 18 years ago

Replying to karsu:

I confirm that [4919] breaks admin interface if you are using non-ascii characters.

DEFAULT_CHARSET = utf-8

I made test case for that problem.

This example doesn't work prior to [4919] (I tested with [4918]). This ticket is only about the change reported in the summary, so any related test case should work in [4918].

Thanks for making a test case, but it isn't related to this ticket.

comment:9 by mir@…, 18 years ago

Component: InternationalizationTemplate system
Triage Stage: UnreviewedAccepted

I can also confirm that [4919] is broken. The problem is:

  • when I use oldforms, the context is all in utf-8 encoded bytestrings
  • with [4919], escape() now returns a unicode string
  • template.render joins the various strings, and during this python tries to decode the bytestrings to unicode strings. This happens with the python defaultencoding, which is 'ASCII'. If any bytestring contains non-ASCII characters, you get the exception mentioned above.

I'm not familiar with the admin interface, but I guess it still uses oldforms and also escape() somewhere.

Reverting [4919] solved the issues for me.

comment:10 by mir@…, 18 years ago

Sorry about the format. Here it is again in a readable format.

I can also confirm that [4919] is broken. The problem is:

  • when I use oldforms, the context is all in utf-8 encoded bytestrings
  • with [4919], escape() now returns a unicode string
  • template.render joins the various strings, and during this python tries to decode the bytestrings to unicode strings. This happens with the python defaultencoding, which is 'ASCII'. If any bytestring contains non-ASCII characters, you get the exception mentioned above.

I'm not familiar with the admin interface, but I guess it still uses oldforms and also escape() somewhere.

Reverting [4919] solved the issues for me.

comment:11 by Malcolm Tredinnick, 18 years ago

(In [4933]) Backed out a portion of [4919] until I can make it worth smoothly with
oldforms. Refs #3924.

comment:12 by anonymous, 18 years ago

Cc: jm.bugtracking@… added

comment:13 by olga@…, 18 years ago

Hello,
I also have such error. I use newforms and I see that SelectDateWidget() raises UnicodeError. Before, I solved this problem fixing in django\newforms\util.py. I've removed 'smart_unicode_lazy'.

def smart_unicode(s):
    if isinstance(s, Promise):
        # The input is something from gettext_lazy or similar. We don't want to
        # translate it until render time, so defer the conversion.

        # return smart_unicode_lazy(s)
        return smart_unicode_immediate(s) 
    else:
        return smart_unicode_immediate(s) 

I've upgraded Django till 4939 rev. And I see the problem still exists. So I've just tried to make it like it was before:
django\utils\encoding.py

#def smart_unicode(s):
#    if isinstance(s, Promise):
        # The input is the result of a gettext_lazy() call, or similar. It will
        # already be encoded in DEFAULT_CHARSET on evaluation and we don't want
        # to evaluate it until render time.
        # FIXME: This isn't totally consistent, because it eventually returns a
        # bytestring rather than a unicode object. It works wherever we use
        # smart_unicode() at the moment. Fixing this requires work in the
        # i18n internals.
#        return s
#    if not isinstance(s, basestring,):
#        if hasattr(s, '__unicode__'):
#            s = unicode(s)
#        else:
#            s = unicode(str(s), settings.DEFAULT_CHARSET)
#    elif not isinstance(s, unicode):
#        s = unicode(s, settings.DEFAULT_CHARSET)
#    return s

def smart_unicode(s):
    if isinstance(s, Promise):
        # The input is something from gettext_lazy or similar. We don't want to
        # translate it until render time, so defer the conversion.
        return smart_unicode_immediate(s)
    else:
        return smart_unicode_immediate(s)

def smart_unicode_immediate(s):
    if not isinstance(s, basestring):
        if hasattr(s, '__unicode__'):
            s = unicode(s)
        else:
            s = unicode(str(s), settings.DEFAULT_CHARSET)
    elif not isinstance(s, unicode):
        s = unicode(s, settings.DEFAULT_CHARSET)
    return s

Everything works fine. I don't know what is wrong. Looking forward for some advice. Thanks.

comment:14 by Malcolm Tredinnick, 18 years ago

The solution in the previous comment will fail for Forms that are created at import time and then the locale() changes before they are used. It forces the translation to occur at the time smart_unicode() is called, rather than at the moment of output (when the locale will be set correcty). Which means almost 100% failures for many internationalised applications.

I am working on fixing the FIXME in that code at the moment.

in reply to:  14 ; comment:15 by olga@…, 18 years ago

Replying to mtredinnick:

The solution in the previous comment will fail for Forms that are created at import time and then the locale() changes before they are used. It forces the translation to occur at the time smart_unicode() is called, rather than at the moment of output (when the locale will be set correcty). Which means almost 100% failures for many internationalised applications.

I am working on fixing the FIXME in that code at the moment.

Thanks for your reply. Now I use only German, but in the future, maybe in a month, it will be multilanguage application. Is there some right solutions on this moment or we will get working code soon? Thanks again!

comment:16 by Malcolm Tredinnick, 18 years ago

Regarding comment 13, you didn't actually say what you did to cause the error, just what you did to stop it. If you have a short example, could you post it, please. I'd like to see somebody else's repeatable test cases to ensure I am not missing anything.

comment:17 by Malcolm Tredinnick, 18 years ago

#3947 has an example of the same problem. Add something like activate('es') before rendering the form to see the problem.

comment:18 by antono.aloy@…, 18 years ago

Hello!

I can reproduce the problem also within the admin interfave when trying to add a record to a table having a ForeignKey relation to another table with utf-8 characters in it.

My language code is 'es-es' and default charset to utf-8

comment:19 by antoni.aloy@…, 18 years ago

Cc: antoni.aloy@… added

I can confirm there is an unicode related error on rendering the ForeignKey drop down list in the administration application, as I can reproduce it each time, just add some 'à,é, etc' to one of the field returned by the str method of the foreign key.

Adding a "unicode(yourstring, errors="ignore") to the str output gave me a quick&dirty solution, but it removes all the ofending characters.

in reply to:  16 comment:20 by anonymous, 18 years ago

Replying to mtredinnick:

Regarding comment 13, you didn't actually say what you did to cause the error, just what you did to stop it. If you have a short example, could you post it, please. I'd like to see somebody else's repeatable test cases to ensure I am not missing anything.

Hello!
Here is an example:
tmp.py:

from django import newforms as forms 
from django.shortcuts import render_to_response
from django.newforms.extras import SelectDateWidget
import datetime

class TestForm(forms.Form):
    birthdate = forms.DateField(
               widget = SelectDateWidget(years=xrange(datetime.date.today().year,1900,-1))
       )

def unicode_test(request):
    form = TestForm()
    return render_to_response('test.html', {'form': form})

test.html:

<html xmlns="http://www.w3.org/1999/xhtml">
	<head>
		<title>Test</title>
	</head>
	<body>
		{{ form }}
	</body>
</html>

Also I have such setting:

LANGUAGE_CODE = 'de'
gettext = lambda s: s
LANGUAGES = (
    ('de', gettext('Greman')),
)

MIDDLEWARE_CLASSES = (    
    'django.contrib.csrf.middleware.CsrfMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.locale.LocaleMiddleware',    
    'django.middleware.common.CommonMiddleware',    
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.middleware.doc.XViewMiddleware',
)

TEMPLATE_CONTEXT_PROCESSORS = (
    "django.core.context_processors.auth",
    "django.core.context_processors.debug",
    "django.core.context_processors.i18n",
)

DATABASE_OPTIONS = {'charset': 'utf8'} 
DEFAULT_CHARSET = 'utf-8'

comment:21 by olga@…, 18 years ago

Sorry, previous example is mine. Also I know that the problem is "März" month. When I change _('March') on 'March' everything works fine.

in reply to:  15 comment:22 by Jari Pennanen, 18 years ago

Replying to mtredinnick:
The solution in the previous comment will fail for Forms that are created at import time and then the locale() changes before they are used. It forces the translation to occur at the time smart_unicode() is called, rather than at the moment of output (when the locale will be set correcty). Which means almost 100% failures for many internationalised applications.

I am working on fixing the FIXME in that code at the moment.

I debugged one of my problems and final conclusion was same as in FIXME... One question though, what bug will you mark as fixed when it is ready? Since I could really need patch for this. I have now ugly hacks of using decode here and there, where ever gettext_lazy and smart_unicode is used, which mostly is in BaseForm btw.

comment:23 by Michael P. Jung, 18 years ago

Cc: mpjung@… added

comment:24 by alessandro.ronchi@…, 18 years ago

There is another similar problem in _html_output. I've corrected commenting out the FIXME!! in the file django_src/django/newforms/forms.py


def _html_output(self, normal_row, error_row, row_ender, help_text_html, errors_on_separate_row):
        "Helper function for outputting HTML. Used by as_table(), as_ul(), as_p()."
        top_errors = self.non_field_errors() # Errors that should be displayed above all fields.
        output, hidden_fields = [], []
        for name, field in self.fields.items():
            bf = BoundField(self, field, name)
            bf_errors = ErrorList([escape(error) for error in bf.errors]) # Escape and cache in local variable.
            if bf.is_hidden:
                if bf_errors:
                    top_errors.extend(['(Hidden field %s) %s' % (name, e) for e in bf_errors])
                hidden_fields.append(unicode(bf))
            else:
                if errors_on_separate_row and bf_errors:
                    output.append(error_row % bf_errors)
                if bf.label:
                    label = escape(bf.label)
                    # Only add a colon if the label does not end in punctuation.
                    if label[-1] not in ':?.!':
                        label += ':'
                    label = bf.label_tag(label) or ''
                else:
                    label = ''
                if field.help_text:
                    #help_text = help_text_html % field.help_text FIXME!!
                    help_text = u''
                else:
                    help_text = u''
                output.append(normal_row % {'errors': bf_errors, 'label': label, 'field': unicode(bf), 'help_text': help_text})

comment:25 by Malcolm Tredinnick, 18 years ago

Resolution: fixed
Status: reopenedclosed

The original problem reported in this bug was fixed by [4933], so I'm going to close this.

The broader issues that some comments raise are being addressed on the unicode branch and aren't related to the original problem report in any case (so should be separate tickets if they aren't fixed after the unicode branch merges).

comment:26 by gnudiff, 15 years ago

Component: Template systemdjango.contrib.admin
Resolution: fixed
Status: closedreopened
Version: SVN1.0

I can confirm the same error as of today, Django 1.0.2 final.

The error seems to occur when admin interface has to return string representation of a foreign key, such as:

class Visitor(models.Model):
    firstName = models.CharField(max_length=60,help_text="Vārds")
    lastName = models.CharField(max_length=60,help_text="Uzvārds")

    def __unicode__(self):
        return "%s %s" % (self.lastName,self.firstName)
####
class Visit(models.Model):
    visitor = models.ForeignKey(Visitor, help_text="Pacients")
    visitDate = models.DateTimeField(help_text="Pieteiktais laiks")

    def __unicode__(self):
        return "%s @ %s" % (self.visitor,self.visitDate.isoformat())

In this case, if you have Visitor with nonascii chars in name, you will be able to see him in admin interface fine, but as soon as you try to add a Visit for him and then try to access the list of all Visits, you will get an error of the following type.

Request URL:  	http://127.0.0.1:8000/admin/callregister/visit/add/
Exception Type: 	DjangoUnicodeDecodeError
Exception Value: 	

'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128). You passed in <Visit: [Bad Unicode data]> (<class 'prs.callregister.models.Visit'>)

Exception Location: 	C:\Python25\lib\site-packages\django\utils\encoding.py in force_unicode, line 70
Python Executable: 	C:\Python25\python.exe
Python Version: 	2.5.0

in reply to:  26 comment:27 by Ramiro Morales, 15 years ago

Resolution: fixed
Status: reopenedclosed

Replying to gnudiff:

I can confirm the same error as of today, Django 1.0.2 final.

Please don't reopen ticket that were fixed agains a very earlier version and are two-years old. If you, after asking in support channels like the mailing list, are sure it is a Django ticket and not a problem with your code, then open a new ticket.

>     def __unicode__(self):
>         return "%s %s" % (self.lastName,self.firstName)

You are supposed to return Unicode data from the __unicode__ method. Try with return u"%s %s" % (self.lastName,self.firstName).

Restoring ticket status.

comment:28 by Karen Tracey, 15 years ago

It's actually Visit's __unicode__ method that is the problem in this specific case. It needs to be:

   def __unicode__(self):
        return u"%s @ %s" % (self.visitor,self.visitDate.isoformat())

(Not that fixing Visitor's as well isn't a good idea, it just won't fix the exception mentioned.)

comment:29 by gnudiff, 15 years ago

Yep, sorry, you are right!

What threw me off track was that it was not consistent (ie. in unicode chars in model's unicode func thing worked, it was only when one unicode was calling other model's unicode where the problem started).

comment:30 by Karen Tracey, 15 years ago

Yes, Python's behavior here can cause some confusion. Note "%s" % x may evaluate to either Unicode or a bytestring depending on the type of x:

>>> u = u"Unicode!"
>>> b = 'Bytestring'
>>> "%s" % u
u'Unicode!'
>>> "%s" % b
'Bytestring'

This is why you often won't see a problem with your Visitor __unicode__ as originally coded: usually the lastName and firstName attributes will be Unicode values, so "%s %s" % (self.lastName,self.firstName) will evaluate to a Unicode value and you won't see a problem.

For more complicated situations, say a class that supports both __str__ and __unicode__, whether or not you specify u'' on the interpolation will control what type of result you get:

>>> class Thing(object):
...     def __init__(self, x):
...         self.x = x
...     def __str__(self):
...         return self.x.encode('utf-8')
...     def __unicode__(self):
...         return self.x
... 
>>> t = Thing(u'\u0101')
>>> "%s" % t
'\xc4\x81'
>>> u"%s" % t
u'\u0101'

This is where your Visit __unicode__ method as originally coded has a problem. self.visitor has both __str__ and __unicode__ methods, so unless you force the __unicode__ one to be called by using u'' for the interpolation, the result will be a bytestring, and if that contains non-ascii chars then the automatic coercion to unicode using the ascii codec will generate an exception.

>>> unicode("%s" % t)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)
>>> 

So yes, sometimes it seems like you don't necessarily absolutely need the u'' for the __unicode__ return value. But figuring out when exactly you need it and when you don't can be complicated so it's a good idea to use it as a matter of routine.

Note: See TracTickets for help on using tickets.
Back to Top