About 10 years ago when I first learned about ORM (object-relational mappers), I was so stoked. In the past, for any web applications I’ve built with a relational database, I had to write lots of SQL queries. This became annoying especially when a lot of the information I needed from the database were simple select statements and it was tiring to type out all the SQL queries for everything I needed for my app. I would copy and paste different SQL queries and make minor tweaks for different features. This was mundane, repetitive, and extremely boring.
Then, when I learned Ruby on Rails back in 2011 and got exposed to other ORM libraries, I was really excited. This basically meant that I didn’t have to write those simple SQL queries over and over again but simply have my model do all the work!
Even better, this meant, I could easily switch a database from one to another without having to worry about all the syntax differences between each other. Life seemed like a bliss and I was amazed as I was able to build applications a lot faster.
I loved how these models worked so much but I became a fan of this approach and told everyone I knew about ORM (this is when not that many people knew about Rails and other frameworks haven’t really caught up to this movement).
Now, that I’ve done ORM for many years now, however, I am realizing that many developers, especially those new, are not thinking about the database load of ORM and not using ORM for what it was intended but using it for everything they need (even for things that they shouldn’t be using ORM for).
Developers who are not thinking about database load often build things in ORM only to have this put a huge toll on the database, especially as our data gets larger and as we are also doing lots of group by statements and pulling reports.
For example, consider that you had two models in your database: Post and Comment (where a single post can have many comments). Say on the particular page that you’re rendering, you needed to display 100 posts and all the comments associated with each post. When developers are using ORM for this, often times, you see them prepare the posts in the controller and then in the view file, do something like this:
{% for post in posts %} <p>Post: {{ post.message }}</p> {% for comment in post.comments.all() %} <p>Comment: {{comment.comment}}</p> {% endfor %} <p>{{comment.comment}}</p> {% endfor %}
If you needed to display 1000 posts on that page, how many sql queries would be run to render this page? What if each comment had a sub-comment and your code was doing something like this?
{% for post in posts %} <p>Post: {{ post.message }}</p> {% for comment in post.comments.all() %} <p>Comment: {{comment.comment}}</p> {% for subcomment in comment.subcomments.all() %} <p class='sub_comment'>Sub-Comment: {{subcomment.comment}}</p> {% endfor %} {% endfor %} <p>{{comment.comment}}</p> {% endfor %}
This is now a nightmare. These scenarios are however very common and it pains me to see developers (including some of my own developers) who fall into the trap of ORM and relying on that too much, when they really should not. The example above is for example doing an algorithm exercise where you could do things in O(N) time but your code is actually doing things in O(N^3). As much as we hate inefficient algorithm, we should also abhor when we see ORM put into practice as above.
Another danger of ORM
The fact that ORM does all the server validations for you with just a few lines in the model is really awesome. A few more lines in the model and with some other gems/libraries/modules, you could even have it render all the proper server side validation errors or success messages with just a few more lines. Life is all beautiful right?
These beautiful things of ORM however introduce some interesting behavior. For things where you could just write a SQL query in 30 seconds, to learn how to do that using ORM, when you have to do “GROUP BY” or “JOIN” or “LIMIT” or “HAVING”, etc, becomes increasingly complex, often hours of research and tweaking. In fact, I have seen many developers pour through documentation to learn how to do something in ORM and spending multiple hours. When I tell them that they should just write a raw sql query for that, they become stubborn and try to still find an ORM way to do it. Later they figure out how to do it using ORM but 1) it took them a long time to learn how to do it using ORM and 2) the way they wrote it in ORM is so confusing that it’s not even clear what it’s doing. In fact, sometimes the ORM way is even longer than if they just wrote a raw SQL query! I feel bad for the next developer who would have to modify any of the code there.
Concluding Thought
After seeing developers fall into these traps so often, now I tell my developers not to use ORM for anything unless it’s very simple statements. As even select statements could introduce database overload (as shown in the first example), even that, I am weary about the developers following that. What makes me sad is when I see developers who know ORM but who don’t know how to do raw SQL queries. That’s purely sad and all good developers must know both SQL queries as well as how ORM works and when it should be used. I think we must also remember that after all ORM is just another code/library built by other developers to convert some simple operations that were repetitive as well as reduce some of the repetitive operations that we as developers had to do (such as server side validations, handling typical errors, performing simple CRUD operations). Let’s not forget what it was designed for and not use ORM for things where it really wasn’t designed for.
Let’s also be especially careful when we’re using ORM to render a page and always check the # of queries that were executed to render that page (and ask yourself, could you have done it with less queries)? With this in mind, we’re on a path to producing good quality code that also performs well.
What do you think?
Do you agree/disagree with my comments above? What has been your experience with ORM so far, particularly as it relates to performance?