Tuesday, May 22, 2012

MongoDB Tidbits - PHP, Mongo Collections and ODM

So I jumped onto the NoSql (Not-Only Sql) bandwagon and off-late have been dabbling with MongoDB and PHP. Coming from a RDBMS background, NoSql is for sure a different paradigm and requires thinking differently. These series of posts are on my experience with MongoDB, the problems that I’ve faced and the solutions/workarounds that I’ve found so far. All the code samples in these posts are on 64-bit Linux version of MongoDB 2.0.3 and PECL Mongo driver 1.2.9. These posts assume basic familiarity with MongoDB.
Background
The reason why I started even looking outside of RDBMS was due to loosely defined schema that I had to support. Basically, on our website a user can submit activities, now these activities can be on different types and the activity attributes change based on the activity type. We started off with MySql DB and soon started realizing that supporting different schemas in a RDBMS was a bit painful. The most popular approach for modeling such kind of schema seems to be EAV (as used by Magento) but I just found it to be a bit too complex. We actually started serializing our entities into xml and dumping them into a My Sql column but I realized that by doing so, I am not getting any benefits of using a RDBMS (no referential integrity etc).
Why Mongo?
Once I decided that RDBMS was a no-go, I started looking out for alternatives in the NoSql world. I had tried playing with Cassandra and Hadoop/HDFS earlier but felt they had a steep learning curve. Also, I feel Cassandra and Hadoop/HDFS are more suitable for applications dealing with huge amount of data, given their distributed nature and complex processing (great support for Map-Reduce). I finally evaluated MongoDB, CouchDB and Redis – found Redis to be a glorified Key-Value store and MongoDB to be closest to a RDBMS (you can have indexes, dbrefs) without being a RDBMS, making it a little easier for people with RDBMS background to learn it.
Mongo Schema Design
After picking on Mongo and installing it, the first choice that you have to make is on the schema design and how to define relationshipts between entities . There are two ways in which you can define relationships –

  1. Embedding: A document becomes a subdocument of another document. You can embed as many levels deep as you wish. Warning – Embedding generally works great if you are embedding only up to one level as Mongo currently does not have good support for querying/updating attributes which are nested multiple levels deep – more on it in a subsequent post where-in I ran into the issue with $ operator.
2.       Linking: Documents are different entities (part of different collections) and are linked by their MongoIds (_Id). Enforcing the relational integrity is primarily the responsibility of the client application.
Based on the docs, I also followed the same principle: if a relationship between two entities is “Composition” , I embed sub- entitiy subdocument else I add it to a different collection and link them using MongoId. Below is one example of composition and linking –
>  db.activities.find({}).pretty();
activityTitle:”This is example”,
tags:[“tag1”,”tag2”,”tag2”],
submittedBy:23
….
Here tag is a subdocument (1:n relationship), whereas submittedBy has the reference to the Id of the user stored in db.users. Tag by itself can be a complex object (i.e. it can have its own attributes like tag:[{tagId:1,name:”tag1”,submittedBy:23},…].

That’s pretty much on the schema design; currently our schema design is pretty straightforward with collections for “top-level” entities like activities, users, keywords (more on this later) and subdocuments for sub-entities like tags.
One caveat: Mongo column names are case sensitive, which is different from MySql. Quite a few times in the beginning, I have had the col name in wrong case and wasted bunch of time trying to figure out what’s wrong with my query.
ODM (Object Document Mapper) Strategy
 Once the Mongo collections are finalized and the PHP DTO/Models (Data Transfer Object) classes defined, the next step is to figure out how are we going to map our models to Mongo collections and retrieve/persist the same? The PECL driver is pretty flexible but only support retrieving/persist PHP associative arrays, so there has to be an adapter in between which converts our PHP class into an associative array. There are few mappers available already like Doctrine and Php-ODM but somehow I wasn’t very comfortable with them – doctrine: seemed to be a bit too heavy, whereas Php-ODM relies on storing property in an internal array; what we wanted was a way to store protected variables and have getters and setters so that we can validate the values and typecast them. Also, we had the need for being able to return only a subset of properties (in case of partial update). Our implementation is pretty simple: every model inherits from BaseModel which has a method toArray(). The toArray() just calls get_object_vars() and takes two optional arrays as parameter: includeItems and excludeItems. Below is the implementation of our toArray() method –

Our Models have protected variables for properties which need to be serialized and override the toArray() method if needed.


That's pretty much it for our ODM

Thursday, March 08, 2012

Still searching...

Another year, same day when I think about where I am, what am I doing and what do I want to achieve. Another year, same day and I still haven't found answers to any of these rhetorical questions. Sometimes, i do wonder if I'm being even reasonable but then "what's rhyme or reason to a fool or a dreamer?" Who am I being these years, a fool, a dreamer, maybe both?  Anyway, reasoning is a relative term, I know, I could convince others based on my reasoning if they are at least willing to listen.
I'll keep waiting for answers...there's nothing else for me to do, anyway.