Portrait of Christopher

Christopher Bennell

I’m a full-stack web developer specializing in Ruby on Rails and Education Technology. Get in touch.

Filtering by Association Count in Complex Models in Rails

I recently tried to optimize a slow page, and bumped into some limitations in the way Rails works with complex associations.

The models I’m working with are Courses, Users, and Enrollments. There is a many-to-many relationship between Courses and Users, with Enrollments acting as the join model. Enrollments also have a role, indicating if the user is a student, a teacher, etc. I’ve implemented some unconventional has_many relationships, which I may explain in a future post.

app/models/course.rb
1
2
3
4
5
6
7
8
class Course < ApplicationRecord
  has_many :enrollments
  has_many :teacher_enrollments, -> { teacher.eager_load(:user) }, class_name: "Enrollment"
  has_many :student_enrollments, -> { student.eager_load(:user) }, class_name: "Enrollment"
  has_many :users, through: :enrollments, inverse_of: :courses
  has_many :teachers, through: :teacher_enrollments, source: :user
  has_many :students, through: :student_enrollments, source: :user
end
app/models/user.rb
1
2
3
4
class User < ApplicationRecord
  has_many :enrollments
  has_many :courses, through: :enrollments, inverse_of: :user
end
app/models/enrollment.rb
1
2
3
4
5
6
7
class Enrollment < ApplicationRecord
  belongs_to :course
  belongs_to :user

  scope :teacher, -> { where(role: "teacher") }
  scope :student, -> { where(role: "student") }
end

The Problem

My view needs to display a large list of courses, along with the teacher’s name. It also needs to not show courses with zero enrolled students.

My initial, naive, slow version looked something like this:

1
2
3
4
5
6
7
# SLOW
Course
  .includes(
    :teachers,
    :students
  )
  .select { |c| c.students.any? }

The issue arose because I was loading a large set of student data just to check for student enrollment, which significantly slowed down the query: it had to load all the enrollments and user objects, even though I wasn’t using any of that information.

I tried various approaches to solving this. The most promising seemed to be Rails’ where.associated, which is the inverse of where.missing. These methods are supposed to “let you select a set of records based on the presence or absence of an association”, which seemed ideal. However, applying this did not give the desired results. I tried both

1
2
3
4
# Naive approach: loads all students and teachers, then filters courses with students.
Course
  .where.associated(:students)
  .includes(:teachers)

and

1
2
3
Course
  .where.associated(:student_enrollments)
  .includes(:teachers)

Both of these queries resulted in very large result sets, with duplicated course records in the response, likely due to issues with the SQL join caused by the “through” associations. I didn’t dig into the details, but I’m curious why this happened; perhaps it’s an opportunity to contribute an enhancement to Rails.

The Solution

Since the Rails built-in solution didn’t work, I turned to SQL. The SQL I needed ended up looking like this:

1
2
3
4
5
6
7
8
9
SELECT
  courses.*,
  (
    SELECT count(*)
    FROM enrollments
    WHERE enrollments.course_id = courses.id and enrollments.role = 'student'
  ) as student_count
FROM courses
WHERE student_count > 0

I’m not an SQL expert; it’s possible that a JOIN could be faster.

Now, how to express that in my Rails models? It turns out that scopes work nicely for this. I settled on using two scopes to express the two different elements of this query: computing the student_count and filtering for courses with students. This makes the code a bit more clear and would allow me to reuse the student count in other contexts.

app/models/course.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class Course < ApplicationRecord
  has_many :enrollments
  has_many :teacher_enrollments, -> { teacher.eager_load(:user) }, class_name: "Enrollment"
  has_many :student_enrollments, -> { student.eager_load(:user) }, class_name: "Enrollment"
  has_many :users, through: :enrollments, inverse_of: :courses
  has_many :teachers, through: :teacher_enrollments, source: :user
  has_many :students, through: :student_enrollments, source: :user

  # Adding scopes to filter courses by student count
  scope :with_student_count, -> {
    select("
      courses.*, (select count(*)
      from enrollments
      where enrollments.course_id = courses.id and enrollments.role = 'student') as student_count")
  }
  scope :has_students, -> { with_student_count.where("student_count > ?", 0) }
end

Now we can use the has_students scope in our original query:

1
2
3
Course
  .has_students
  .includes(:teachers)

This change significantly reduced the response time on my page load.

I always reach for the Rails built-in methods like where.associated when I can, but sometimes complex models and specific requirements necessitate more custom solutions. In this case, leveraging raw SQL within Rails scopes provided the flexibility and efficiency needed to solve the problem of filtering courses by student enrollment count without loading unnecessary data. The solution—using a SQL subquery in combination with Rails scopes—demonstrates how Rails allows us to seamlessly integrate SQL for performance optimizations, keeping the code clean and reusable while significantly improving response times.