Memgraph Query Optimization – Tips and Helpful Hints

Recently, I realized my software team’s Memgraph query was taking longer than it reasonably needed to. I did some research, and in addition to several suggestions on Memgraph’s blog, I found a couple of other culprits. In combination, addressing them allowed me to cut our query time in half.

Background

Among other types of nodes, the graph contains nodes representing physical assets such as apartment buildings and floors and units inside those apartment buildings. The graph has about 35,000 physical asset nodes, which have about 30,000 directed relationships. A unit is CONTAINED_IN a floor, which is CONTAINED_IN a building. When treated as undirected, this effectively becomes roughly 60,000 relationships.

Although the examples focus on a single domain, the lessons apply broadly to Memgraph and Cypher-style graph queries. I often ran EXPLAIN and PROFILE on my query to view the execution plans, but I’ll use the absolute time of these queries for simplicity.

Expressing Relationships Effectively

The way relationships are expressed (directed, undirected, variable-length, explicit) has a huge impact on cypher queries. Two queries could yield the exact same result, but one of these queries could be much more efficient.

Directing Your Edges

As I mentioned, our graph has about 30,000 directed relationships between physical assets. If we don’t leverage this directionality, the number of relationships doubles, and our query stands to suffer quite a bit. Removing directionality from our query helped me understand just how much of an impact this would have.

Undirected (19.61s)

MATCH (unit:Physical {type: "unit"})-[:CONTAINED_IN*]-(building:Physical {type: "building"})
RETURN unit.id

Directed (~25ms)

MATCH (unit:Physical {type: "unit"})-[:CONTAINED_IN*]->(building:Physical {type: "building"})
RETURN unit.id

The only difference here is a tiny little arrow, and we can cut over 19 seconds off this query.

Specifying How Far to Traverse

The next experiment I tried was adding explicit traversal depth. Variable-length traversals (*) can be useful but expensive, because the planner must consider paths of all lengths. In our case, I knew that a unit is always 2 relationships away from a building, so we can tell Memgraph ahead of time how many relationships to traverse.

Directed with fixed depth (~21–23ms)

MATCH (unit:Physical {type: "unit"})-[:CONTAINED_IN*2]->(building:Physical {type: "building"})
RETURN unit.id

Adding a fixed depth shaved off a few milliseconds. Especially in our real query, where we need significantly more nodes and relationships, it made a bit of a difference.

Undirected with fixed depth (~952ms)

MATCH (unit:Physical {type: "unit"})-[:CONTAINED_IN*2]-(building:Physical {type: "building"})
RETURN unit.id

For fun, I tried our undirected query with a fixed depth. It helped a lot, but remained unreasonably slow. The planner still had two directions to consider at each hop.

Explicit Pattern Expansion

In cases where the number of hops is known, it can be more efficient to write out the path rather than rely on variable-length traversal.

Explicit pattern (~19-20ms)

MATCH (unit:Physical {type: "unit"})-[:CONTAINED_IN]->(:Physical)-[:CONTAINED_IN]->(building:Physical {type: "building"})
RETURN unit.id

This approach avoids both direction ambiguity and the planning overhead associated with *n patterns. Performance was consistently in the same range as the fast directed queries.

Scoping Your Optional Calls

OPTIONAL MATCH clauses can unintentionally broaden the search space when they rely on variables defined outside their immediate scope. Moving them into a subquery confines evaluation and results in more predictable performance.

Unscoped version (~26ms)

MATCH (unit:Physical {type: "unit"})
OPTIONAL MATCH (unit)<-[:TENANT_OF]-(tenant:Identity)
RETURN unit.id, collect(tenant.id)

The unscoped optional match forces the planner to aggregate over a much larger intermediate result. This is only worsened when we are matching on several other nodes before we get to the tenant match.

OPTIONAL MATCH expands each unit to all its identity nodes. You now have roughly (number of units) × (average number of tenants per unit) rows.

RETURN unit.id, collect(tenant.id) is an aggregation. So Cypher has to materialize that big expanded row set, and then perform a global aggregation over it (group by unit.id, collect all tenant.id).

Scoped subquery version (~23ms)

MATCH (unit:Physical {type: "unit"})
CALL {
  WITH unit
  OPTIONAL MATCH (unit)<-[:BOUND_TO]-(tenant:Identity)
  RETURN collect(tenant.id) AS tenantIds
}
RETURN unit.fid, tenantIds

This version aggregates per unit in its own scope, which usually means fewer rows, less memory, and a faster plan. While the absolute time didn’t decrease as much in this example, it was one of the most impactful changes to our query that matches over 15 different nodes and relationships.

Lessons Learned

From these tests, several consistent themes emerged:

  • Directionality has the largest impact. Avoiding undirected edges can be the difference of seconds to milliseconds.

  • Constrain the traversal. Specifying hop counts or expanding the pattern explicitly leads to more efficient planning.

  • Scope optional logic. Subqueries prevent unnecessary broadening of the match space.

These Memgraph query optimization strategies improved the speed of key queries without requiring schema changes or data restructuring.

Conversation

Join the conversation

Your email address will not be published. Required fields are marked *