Code

lyricwiki: convert numeric HTML escape sequences to proper characters
authorThomas Jansen <mithi@mithi.net>
Sun, 1 Nov 2009 23:49:11 +0000 (00:49 +0100)
committerThomas Jansen <mithi@mithi.net>
Sun, 1 Nov 2009 23:49:11 +0000 (00:49 +0100)
I've stumbled across several cases of obfuscated lyrics that use the numeric
HTML escape sequences.

lyrics/02-lyricwiki.rb

index b3b702825a4414f67f8c067b77d8d033bdaf06cb..db7b970307dd03d944eb76e971cedaf8698c72ce 100755 (executable)
@@ -23,6 +23,7 @@
 
 require 'uri'
 require 'net/http'
+require 'cgi'
 
 url = "http://lyrics.wikia.com/api.php?action=lyrics&fmt=xml&func=getSong" + \
     "&artist=#{URI.escape(ARGV[0])}&song=#{URI.escape(ARGV[1])}"
@@ -47,4 +48,4 @@ if not $1 =~ /^.*<\/div>(.*?)$/im
        exit(1)
 end
 
-puts $1.gsub(/<br \/>/, "\n")
+puts CGI::unescapeHTML($1.gsub(/<br \/>/, "\n"))