Overkill Email Obfuscation with Ruby and Javascript

Posted by unixmonkey on March 12, 2008

Robot Spiders from RunawayThe web is a generally free and open place for all types of communication, but if you put your email address on 1 website, you can expect an email-harvesting robot spider to find that address and send it to its spammer overlords.

Once on a spammer’s list, you can expect to get all kinds of interesting stock tips, products to enhance your manhood, and friendly letters from Nigerian diplomats.

If you simply have too little to do in the day, this can be a great way to meet new people and start a career in day trading. However, some of us are just too darn busy to stop what we are doing every 2/3rds of a second to check our email; but still need it for keeping in contact with friends, family, and business contacts.

From a few tips pulled from the web, I set to create a nice link helper for Ruby / Rails intended to display email links that work indistinguishably from regular mailto: links, and even gracefully downgrade for users without javascript.

Lets not even display the email address on the page at all, and use a little javascript to render the email address after the fact by breaking it up and putting it back together with javascript.

# Takes in an email address and (optionally) anchor text,
# its purpose is to obfuscate email addresses so spiders and
# spammers can't harvest them.
def js_antispam_email_link(email, linktext=email)
    user, domain = email.split('@')
    # if linktext wasn't specified, throw email address builder into js document.write statement
    linktext = "'+'#{user}'+'@'+'#{domain}'+'" if linktext == email 
    out =  "<noscript>#{linktext} #{user}(at)#{domain}</noscript>n"
    out += "<script language='javascript'>n"
    out += "  <!--n"
    out += "    string = '#{user}'+'@'+'#{domain}';n"
    out += "    document.write('<a href='+'ma'+'il'+'to:'+ string +'>#{linktext}</a>'); n"
    out += "  //-->n"
    out += "</script>n"
    return out
end

This is probably good enough for 90% of those robots, but you know if one spammer gets your address, he will likely share (or sell) your email to all his friends. The weak spot in this looks like the noscript version, lets fuzz that up a bit by converting to HTML character entities.

One of the earliest and simplest ways to obfuscate an email address is by converting each character into its HTML equivalent. This makes the source look nasty, but will be correctly rendered by the browser that the end-user is none the wiser.

An address like abc@example.com will look like this in the source:

&#097;&#098;&#099;&#064;&#101;&#120;&#097;&#109;&#112;&#108;&#101;&#046;&#099;&#111;&#109;

Let’s build a simple method to convert a plaintext string into something like the above. I’m going to cheat and only convert a-z and A-Z and leave @ signs, dots, dashes, etc. alone.

# HTML encodes ASCII chars a-z, useful for obfuscating
# an email address from spiders and spammers
def html_obfuscate(string)
  output_array = []
  lower = %w(a b c d e f g h i j k l m n o p q r s t u v w x y z)
  upper = %w(A B C D E F G H I J K L M N O P Q R S T U V W X Y Z)
  char_array = string.split('')
  char_array.each do |char|  
    output = lower.index(char) + 97 if lower.include?(char)
    output = upper.index(char) + 65 if upper.include?(char)
    if output
      output_array << "&##{output};"
    else 
      output_array << char
    end
  end
  return output_array.join
end

now in our js_antispam_email_link method we can “encrypt” the user and domain before sending to the browser like so:

def js_antispam_email_link(email, linktext=email)
  user, domain = email.split('@')
  user = html_obfuscate(user)
  domain = html_obfuscate(domain)
  ...

Not bad, but many spiders these days can still decode HTML entities and get at that address, so lets build up our defenses a bit more by adding some methods to really screw with those spiders.

We’ll write a method that encrypts a string with ROT13 and puts that on the webpage, and use some javascript to decrypt that on page display. ROT13 is a really simple cipher where you take characters a-z and shift them by half the alphabet.

This is a really simple one-liner borrowed from Jay Komineck

# Rot13 encodes a string
def rot13(string)
  string.tr "A-Za-z", "N-ZA-Mn-za-m"
end

Lets use this to really beef up our link helper by using some javascript that can decipher this. JS code taken from Allan Odgaard

string = '#{email}'.replace(/[a-zA-Z]/g, 
  function(c){ 
    return String.fromCharCode(
      (c <= 'Z' ? 90 : 122) >= (c = c.charCodeAt(0) + 13) ? c : c - 26
    );
  });

Now we’ve got some pretty strong defense against those pesky robots and by using simple HTML character encoding and lightweight ROT13 ciphering it shouldn’t be too taxing on your webserver to spit out a page with a few emails on it. Less sophisticated browsers still get the contact info and everyone is a little bit happier to come home to a (relatively) clean inbox.

Here’s the whole shebang put together, put this in application_helper.rb if using rails:

# Rot13 encodes a string
def rot13(string)
  string.tr "A-Za-z", "N-ZA-Mn-za-m"
end
 
# HTML encodes ASCII chars a-z, useful for obfuscating
# an email address from spiders and spammers
def html_obfuscate(string)
  output_array = []
  lower = %w(a b c d e f g h i j k l m n o p q r s t u v w x y z)
  upper = %w(A B C D E F G H I J K L M N O P Q R S T U V W X Y Z)
  char_array = string.split('')
  char_array.each do |char|  
    output = lower.index(char) + 97 if lower.include?(char)
    output = upper.index(char) + 65 if upper.include?(char)
    if output
      output_array << "&##{output};"
    else 
      output_array << char
    end
  end
  return output_array.join
end
 
# Takes in an email address and (optionally) anchor text,
# its purpose is to obfuscate email addresses so spiders and
# spammers can't harvest them.
def js_antispam_email_link(email, linktext=email)
  user, domain = email.split('@')
  user   = html_obfuscate(user)
  domain = html_obfuscate(domain)
  # if linktext wasn't specified, throw encoded email address builder into js document.write statement
  linktext = "'+'#{user}'+'@'+'#{domain}'+'" if linktext == email 
  rot13_encoded_email = rot13(email) # obfuscate email address as rot13
  out =  "<noscript>#{linktext}<br/><small>#{user}(at)#{domain}</small></noscript>n" # js disabled browsers see this
  out += "<script language='javascript'>n"
  out += "  <!--n"
  out += "    string = '#{rot13_encoded_email}'.replace(/[a-zA-Z]/g, function(c){ return String.fromCharCode((c <= 'Z' ? 90 : 122) >= (c = c.charCodeAt(0) + 13) ? c : c - 26);});n"
  out += "    document.write('<a href='+'ma'+'il'+'to:'+ string +'>#{linktext}</a>'); n"
  out += "  //-->n"
  out += "</script>n"
  return out
end

I hope this helps out somebody out there, please leave a comment if you have any suggestions.

Trackbacks

Use this link to trackback from your own site.

Comments

Leave a response

Comments