add stupid temporary hack to strip out utf8mb4 chars that are screwing up mysql

4-byte utf8 chars like emoji are passed around in ruby fine, but
when they are put into mysql queries, strings get truncated at the
first mb4 character.  to prevent truncation, strip out mb4
characters in most user-controlled fields like comments, story
descriptions and titles, and messages.

to properly support utf8mb4, mysql server 5.5 is needed, the table
encodings need to be changed to utf8mb4, and the mysql2 gem needs to
be upgraded once it supports utf8mb4:

https://github.com/brianmario/mysql2/issues/249
This commit is contained in:
joshua stein 2012-11-07 21:58:10 -06:00
parent 0eac1c375a
commit 9ece6666bf
4 changed files with 38 additions and 2 deletions

View file

@ -235,7 +235,8 @@ class Comment < ActiveRecord::Base
end
def comment=(com)
self[:comment] = com.to_s.rstrip
# TODO: remove remove_mb4 hack
self[:comment] = com.to_s.rstrip.remove_mb4
self.markeddown_comment = self.generated_markeddown_comment
end

View file

@ -76,6 +76,16 @@ class Message < ActiveRecord::Base
errors.add(:recipient_username, "is not a valid user")
end
end
# TODO: remove remove_mb4 hack
def body=(b)
self[:body] = b.to_s.remove_mb4
end
# TODO: remove remove_mb4 hack
def subject=(s)
self[:subject] = s.to_s.remove_mb4
end
def linkified_body
Markdowner.to_html(self.body)

View file

@ -313,9 +313,15 @@ class Story < ActiveRecord::Base
self[:url] = u
end
# TODO: remove remove_mb4 hack
def description=(d)
self[:description] = d.to_s.remove_mb4
end
def title=(t)
# change unicode whitespace characters into real spaces
self[:title] = t.strip
# TODO: remove remove_mb4 hack
self[:title] = t.strip.remove_mb4
end
def title_as_url

View file

@ -9,3 +9,22 @@ module ActiveRecord
end
end
end
# XXX stupid hack to strip out utf8mb4 chars that may break mysql queries
# TODO upgrade to mysql 5.5, convert tables to utf8mb4, upgrade mysql2 gem when
# it supports utf8mb4, and remove this hack
class String
def remove_mb4
t = "".force_encoding(self.encoding)
self.each_char do |c|
if c.bytesize == 4
t << " "
else
t << c
end
end
t
end
end