Overkill Email Obfuscation with Ruby and Javascript 7

Posted by unixmonkey on March 12, 2008

Robot Spiders from RunawayThe web is a generally free and open place for all types of communication, but if you put your email address on 1 website, you can expect an email-harvesting robot spider to find that address and send it to its spammer overlords.

Once on a spammer’s list, you can expect to get all kinds of interesting stock tips, products to enhance your manhood, and friendly letters from Nigerian diplomats.

If you simply have too little to do in the day, this can be a great way to meet new people and start a career in day trading. However, some of us are just too darn busy to stop what we are doing every 2/3rds of a second to check our email; but still need it for keeping in contact with friends, family, and business contacts.

From a few tips pulled from the web, I set to create a nice link helper for Ruby / Rails intended to display email links that work indistinguishably from regular mailto: links, and even gracefully downgrade for users without javascript.

Lets not even display the email address on the page at all, and use a little javascript to render the email address after the fact by breaking it up and putting it back together with javascript.

# Takes in an email address and (optionally) anchor text,
# its purpose is to obfuscate email addresses so spiders and
# spammers can't harvest them.
def js_antispam_email_link(email, linktext=email)
    user, domain = email.split('@')
    # if linktext wasn't specified, throw email address builder into js document.write statement
    linktext = "'+'#{user}'+'@'+'#{domain}'+'" if linktext == email 
    out =  "<noscript>#{linktext} #{user}(at)#{domain}</noscript>n"
    out += "<script language='javascript'>n"
    out += "  <!--n"
    out += "    string = '#{user}'+'@'+'#{domain}';n"
    out += "    document.write('<a href='+'ma'+'il'+'to:'+ string +'>#{linktext}</a>'); n"
    out += "  //-->n"
    out += "</script>n"
    return out
end

This is probably good enough for 90% of those robots, but you know if one spammer gets your address, he will likely share (or sell) your email to all his friends. The weak spot in this looks like the noscript version, lets fuzz that up a bit by converting to HTML character entities.

One of the earliest and simplest ways to obfuscate an email address is by converting each character into its HTML equivalent. This makes the source look nasty, but will be correctly rendered by the browser that the end-user is none the wiser.

An address like abc@example.com will look like this in the source:

&#097;&#098;&#099;&#064;&#101;&#120;&#097;&#109;&#112;&#108;&#101;&#046;&#099;&#111;&#109;

Let’s build a simple method to convert a plaintext string into something like the above. I’m going to cheat and only convert a-z and A-Z and leave @ signs, dots, dashes, etc. alone.

# HTML encodes ASCII chars a-z, useful for obfuscating
# an email address from spiders and spammers
def html_obfuscate(string)
  output_array = []
  lower = %w(a b c d e f g h i j k l m n o p q r s t u v w x y z)
  upper = %w(A B C D E F G H I J K L M N O P Q R S T U V W X Y Z)
  char_array = string.split('')
  char_array.each do |char|  
    output = lower.index(char) + 97 if lower.include?(char)
    output = upper.index(char) + 65 if upper.include?(char)
    if output
      output_array << "&##{output};"
    else 
      output_array << char
    end
  end
  return output_array.join
end

now in our js_antispam_email_link method we can “encrypt” the user and domain before sending to the browser like so:

def js_antispam_email_link(email, linktext=email)
  user, domain = email.split('@')
  user = html_obfuscate(user)
  domain = html_obfuscate(domain)
  ...

Not bad, but many spiders these days can still decode HTML entities and get at that address, so lets build up our defenses a bit more by adding some methods to really screw with those spiders.

We’ll write a method that encrypts a string with ROT13 and puts that on the webpage, and use some javascript to decrypt that on page display. ROT13 is a really simple cipher where you take characters a-z and shift them by half the alphabet.

This is a really simple one-liner borrowed from Jay Komineck

# Rot13 encodes a string
def rot13(string)
  string.tr "A-Za-z", "N-ZA-Mn-za-m"
end

Lets use this to really beef up our link helper by using some javascript that can decipher this. JS code taken from Allan Odgaard

string = '#{email}'.replace(/[a-zA-Z]/g, 
  function(c){ 
    return String.fromCharCode(
      (c <= 'Z' ? 90 : 122) >= (c = c.charCodeAt(0) + 13) ? c : c - 26
    );
  });

Now we’ve got some pretty strong defense against those pesky robots and by using simple HTML character encoding and lightweight ROT13 ciphering it shouldn’t be too taxing on your webserver to spit out a page with a few emails on it. Less sophisticated browsers still get the contact info and everyone is a little bit happier to come home to a (relatively) clean inbox.

Here’s the whole shebang put together, put this in application_helper.rb if using rails:

# Rot13 encodes a string
def rot13(string)
  string.tr "A-Za-z", "N-ZA-Mn-za-m"
end
 
# HTML encodes ASCII chars a-z, useful for obfuscating
# an email address from spiders and spammers
def html_obfuscate(string)
  output_array = []
  lower = %w(a b c d e f g h i j k l m n o p q r s t u v w x y z)
  upper = %w(A B C D E F G H I J K L M N O P Q R S T U V W X Y Z)
  char_array = string.split('')
  char_array.each do |char|  
    output = lower.index(char) + 97 if lower.include?(char)
    output = upper.index(char) + 65 if upper.include?(char)
    if output
      output_array << "&##{output};"
    else 
      output_array << char
    end
  end
  return output_array.join
end
 
# Takes in an email address and (optionally) anchor text,
# its purpose is to obfuscate email addresses so spiders and
# spammers can't harvest them.
def js_antispam_email_link(email, linktext=email)
  user, domain = email.split('@')
  user   = html_obfuscate(user)
  domain = html_obfuscate(domain)
  # if linktext wasn't specified, throw encoded email address builder into js document.write statement
  linktext = "'+'#{user}'+'@'+'#{domain}'+'" if linktext == email 
  rot13_encoded_email = rot13(email) # obfuscate email address as rot13
  out =  "<noscript>#{linktext}<br/><small>#{user}(at)#{domain}</small></noscript>n" # js disabled browsers see this
  out += "<script language='javascript'>n"
  out += "  <!--n"
  out += "    string = '#{rot13_encoded_email}'.replace(/[a-zA-Z]/g, function(c){ return String.fromCharCode((c <= 'Z' ? 90 : 122) >= (c = c.charCodeAt(0) + 13) ? c : c - 26);});n"
  out += "    document.write('<a href='+'ma'+'il'+'to:'+ string +'>#{linktext}</a>'); n"
  out += "  //-->n"
  out += "</script>n"
  return out
end

I hope this helps out somebody out there, please leave a comment if you have any suggestions.

Trackbacks

Use this link to trackback from your own site.

Comments

Leave a response

  1. [...] Obfuscation with Ruby and Javascript Published in March 14th, 2008 Posted by Pest Control in Pests unknown wrote an interesting post today onHere’s a quick excerptRobot Spiders from Runaway The web is [...]

  2. Jan Wilmans Sat, 19 Jul 2008 17:16:31 EDT

    Hi,

    Nice code!

    I’ve changed it a bit to my liking, I think it’s both shorter and safer:
    The only thing I broke was noscript support, but I really dont care that
    people using a non-javascript browser can’t email me…

    # Rot13 encodes a string
    def rot13(string)
      string.tr "A-Za-z", "N-ZA-Mn-za-m"
    end
     
    # Takes in an email address and (optionally) anchor text,
    # its purpose is to send the email address rot13'd to
    # javascript so it is never actually send in plain text
    def antispam_email_link(email, linktext=email)
     
      content = "it is : <a>" + linktext + "</a>"
      rot13_encoded_email = rot13(content) # obfuscate email address as rot13
     
      out = "n"
      out += "  &lt;!--n"
      out += "    string = '#{rot13_encoded_email}'.replace(/[a-zA-Z]/g, function(c){ return String.fromCharCode((c = (c = c.charCodeAt(0) + 13) ? c : c - 26);});n"
      out += "    document.write(string); n"
      out += "  //--&gt;n"
      out += "n"
      return out
    end
  3. Justin R. Sat, 02 Aug 2008 02:05:05 EDT

    def html_obfuscate(string)
        lower = ('a'..'z').to_a
        upper = ('A'..'Z').to_a
        string.split('').map { |char|  
          output = lower.index(char) + 97 if lower.include?(char)
          output = upper.index(char) + 65 if upper.include?(char)
          output ? "&amp;##{output};" : char
        }.join
      end


    is a bit cleaner…

  4. Macario Ortega Thu, 16 Jul 2009 12:35:47 EDT

    Yet a bit cleaner:

    def html_obfuscate string
    string.unpack(‘C*’).collect{ |char| “&##{ char };” }.join
    end

  5. Anonymous Thu, 12 Nov 2009 21:41:20 EDT

    Good stuff. Here is another method in Rails which relies on BlueCloth…

    def email_link(email)
    markdown(“”).gsub!(//?/, ”)
    end

  6. Juan Capristán Tue, 02 Aug 2011 08:48:17 EDT

    Rails 3 has the helper method “mail_to” (I’ve started railing with rails 3, so I really don’t know if this helper was available before, but I guess no). This method makes mail obfuscation as simple as adding parameters to it. For example:

    mail_to “abc@abc.com”, “my email”, :encode=>:javascript, :replace_at=>”**at**”, :replace_dot=>”xyzdotxyz”.

    Really simple ;)

    More info about the helper: http://api.rubyonrails.org/classes/ActionView/Helpers/UrlHelper.html

  7. unixmonkey Tue, 02 Aug 2011 09:27:57 EDT

    @Juan

    Rails has pretty much always had mail_to, but the point of this isn’t to just mangle the email address, but mangle it in a way that the javascript can reconstruct it into a plain-jane email link, and still protect it from web-crawlers that harvest email addresses for spamming.

    mail_to ‘somewhere@example.com’, ‘my email’, :encode => :javascript is a pretty good solution, but since it is built into Rails, I would assume some email harvesters already can reverse-engineer it. With a custom solution that is much less likely.

Comments