In a recent project, my software development team migrated PDFs from a Blob Storage container to a binary field in an SQL Server database. Post migration, there was a noticeable drop in the web app’s performance. Endpoints that previously returned data in milliseconds now returned in seconds. I’ll walk through the strategies we used to improve query performance.
Split Queries: Preventing Data Duplication
The query logs showed that EF Core was generating an inefficient single query that led to data duplication (including the big binary field).
For instance, when querying DocumentRevision entities along with their associated Document and Keywords, EF Core was duplicating related entities for each result row. This caused a much larger dataset to be returned than needed, significantly slowing down performance.
// Before: Single query causing data duplication
var revisions = context.Revisions
.Include(r => r.Document)
.Include(r => r.Keywords)
.ToList();
// After: Using AsSplitQuery to avoid duplication
var revisions = context.Revisions
.Include(r => r.Document)
.Include(r => r.Keywords)
.AsSplitQuery()
.ToList();
In the second example, AsSplitQuery tells EF Core to run separate queries for Document and Keywords instead of combining everything into one large query. Although this results in multiple SQL queries, it avoids the costly cross-product, reducing data duplication and improving overall performance. After implementing this change, there was a significant reduction in response times.
For more details on split queries and their tradeoffs, check out the Microsoft documentation.
Using Projections and DTOs for Read-Only Data
By default, EF Core selects all columns for the entities you are querying. That can be inefficient when you only need a subset of the data. A better approach for read-only operations is to use projections, which allow you to select only the necessary columns by leveraging the Select method. This not only reduces the amount of data retrieved from the database but also improves performance by reducing memory usage and transfer time.
// Without projection: Fetching all columns
var revisions = context.Revisions
.Include(r => r.Document)
.Include(r => r.Keywords)
.ToList();
// With projection: Fetching only necessary columns using DTO
var revisionDtos = context.Revisions
.Select(r => new DocumentRevisionDto
{
RevisionNumber = r.RevisionNumber,
DocumentTitle = r.Document.Title,
Keywords = r.Keywords.Select(k => k.Keyword).ToList()
})
.AsNoTracking(). // Optimizing for read-only queries
.ToList();
In this example, the DocumentRevisionDto only includes the properties needed for the read operation (RevisionNumber, DocumentTitle, and Keywords). This reduces the amount of data loaded into memory and sent over the network. Additionally, using AsNoTracking tells EF Core that the retrieved entities are not being tracked, which improves performance by bypassing change detection logic.
For write operations, it’s recommended to work with the entity directly and let EF Core track changes to manage updates and relationships efficiently.
Optimizing with Indexes
Indexes are essential for improving query performance, especially when you are querying large datasets. By default, EF Core indexes primary keys, but when querying by non-primary key fields (such as in WHERE or JOIN clauses), you may benefit from creating additional indexes.
Let’s say you frequently query DocumentRevision entities by their DocumentId and RevisionNumber. You can create an index to speed up those queries:
CREATE INDEX IX_DocumentRevision_DocumentId_RevisionNumber ON DocumentRevisions(DocumentId, RevisionNumber);
Once this index is in place, queries that filter DocumentRevisions by DocumentId and RevisionNumber will perform faster since the database can quickly locate the relevant records using the index rather than scanning the entire table.
Optimizing EF Core Database Query Performance
EF Core provides powerful tools for querying and managing data, but it’s important to understand how the underlying SQL queries are generated to avoid performance bottlenecks. By using split queries, projections with DTOs, and proper indexing, we were able to significantly improve the performance of our web app after migrating PDFs into a SQL Server binary field.