From: Thomas Jansen Date: Sun, 1 Nov 2009 23:49:11 +0000 (+0100) Subject: lyricwiki: convert numeric HTML escape sequences to proper characters X-Git-Tag: release-0.16~44 X-Git-Url: https://git.tokkee.org/?a=commitdiff_plain;h=eae0b74ddd688507209e85b54ca3a4e67a7f5aed;p=ncmpc.git lyricwiki: convert numeric HTML escape sequences to proper characters I've stumbled across several cases of obfuscated lyrics that use the numeric HTML escape sequences. --- diff --git a/lyrics/02-lyricwiki.rb b/lyrics/02-lyricwiki.rb index b3b7028..db7b970 100755 --- a/lyrics/02-lyricwiki.rb +++ b/lyrics/02-lyricwiki.rb @@ -23,6 +23,7 @@ require 'uri' require 'net/http' +require 'cgi' url = "http://lyrics.wikia.com/api.php?action=lyrics&fmt=xml&func=getSong" + \ "&artist=#{URI.escape(ARGV[0])}&song=#{URI.escape(ARGV[1])}" @@ -47,4 +48,4 @@ if not $1 =~ /^.*<\/div>(.*?)$/im exit(1) end -puts $1.gsub(/
/, "\n") +puts CGI::unescapeHTML($1.gsub(/
/, "\n"))