Friday, December 12, 2008

Building a profanity filter with

One of the disadvantages of a web site which relies heavily on user generated content is unwanted content and profanity. Since, there will always be more users then moderators you will have to rely on community policing to bring down unwanted content. But, rather than only rely on community policing (which might work well for a web 2.0 site), you might also want to build a basic profanity filter for your web site or the blog (where there can't be any other moderator except the blog administrators). On the current .net 2.0 site that we worked on we had to build a basic profanity filter using a custom dictionary; since, the only place where you would want the profanity filter to work will be input text elements and textareas ( textbox), I could think of three approaches:

  1. Sanitize the user input every time and on every page i.e. remove the unwanted text on every page and for every textbox on the page by calling a method in class library . This approach is the least scalable of all as it requires every developer to call the required method diligently and unnecessarily adds to the code bloat.
  2. Extend the textbox control & create your own control which internally calls the sanitize method in the getter of overridden Text property; the developers are only required to use the extended textbox instead of the base textbox on their pages. You can also in this case add additional properties like SanitizeText (bool) which can be set to false; in cases where you don't the text to be sanitized. Also, you might want to check if the textbox is not of Password type before running it through your profanity filter!
  3. Use tag mapping to substitute the base textbox with the extended textbox; tag mapping works great when you are already in the middle of your development cycle and have to implement such logic after-the-fact.

Given that we had to implement profanity filter after the fact; we went with approach 3 and so far it has worked great for us. By the way, the profanity filter was built in-house with a custom dictionary (since we needed it for multiple languages) by running the user entered text by the custom dictionary and doing a simple Regex.Replace.