Smarter ideas worth writing about.
blogimage_fundamentals_entityframework

Fundamentals: Entity Framework

Microsoft's Entity Framework (EF) is a powerful Object-Relational Mapper (ORM) that boosts developer productivity by automating much of the work associated with designing and interfacing with an application database. Although the framework is simple at first glance, no shortage of blog posts have been dedicated to describing its correct use -- and for good reason.

Like any complex framework, EF can introduce difficult-to-diagnose bugs and performance problems due to its misuse, particularly as projects grow -- either in scope or in scale. This post will discuss four guidelines for using EF effectively.

Always Disable Lazy Loading

Lazy loading is a feature in Entity Framework that allows you to defer loading of connected entities (via foreign keys) until they are accessed for the first time. There are two main requirements for lazy loading of a navigation property to work:

  1. The property in question must be marked ‘public virtual’
  2. The DbContext’s ‘Configuration.LazyLoadingEnabled’ must be set to true

The following is an example of a DbContext with lazy-load-enabled navigation properties:

public class OrganizationDbContext : DbContext
{
    public OrganizationDbContext()
        : base("name=OrganizationDbContext")
    {
        //Uncomment this line to disable lazy loading
        //Configuration.LazyLoadingEnabled = false;
    }

     public virtual DbSet<Organization> Organizations { get; set; }
     public virtual DbSet<Department> Departments { get; set; }
     public virtual DbSet<Employee> Employees { get; set; }
     public virtual DbSet<Log> Logs { get; set; }
}

public class Organization
{
    public int Id { get; set; }
    public string Name { get; set; }

    public virtual ICollection<Department> Departments { get; set; } //navigation property (note the 'virtual')
}

public class Department
{
    public int Id { get; set; }
    public string Name { get; set; }

    [ForeignKey("Organization")]
    public int OrganizationId { get; set; }

    public virtual Organization Organization { get; set; } //navigation property (note the 'virtual')
    public virtual ICollection<Employee> Employees { get; set; } //navigation property (note the 'virtual')
}

public class Employee
{
    public int Id { get; set; }
    public string Name { get; set; }
    public bool Salaried { get; set; }

    [ForeignKey("Department")]
    public int DepartmentId { get; set; }
    public virtual Department Department { get; set; } //navigation property (note the 'virtual')
}

public class Log
{
    public int Id { get; set; }
    public string Severity { get; set; }
    public string Message { get; set; }
}

And here’s an example of lazy loading in action:

public class OrganizationRepository
{
    private readonly OrganizationDbContext _context;
    public OrganizationRepository(OrganizationDbContext context)
    {
        this._context = context;
    }

    public string GetDepartmentNameForEmployee(int id)
    {
        Employee emp = _context.Employees.Find(id); //SQL query executed here

        return emp.Department.Name; //SQL query also executed here since 'Department' was accessed
    }
}

In some scenarios, this feature grants our code a performance boost since it’s possible to conditionally avoid loading an entity’s contents from the database:

public class OrganizationRepository
{
    private readonly OrganizationDbContext _context;
    public OrganizationRepository(OrganizationDbContext context)
    {
        this._context = context;
    }

    public string GetDeparmentNameForSalariedEmployee(int id)
    {
        Employee emp = _context.Employees.Find(id);//1.  SQL query executed here

        if (emp.Salaried)
            return emp.Department.Name;//2. SQL query executed here since 'Department' was accessed
        else
            throw new InvalidOperationException("Employee is not salaried");//query '2' above was skipped (hooray!)
    }
}

However, incorrect use of this feature has serious performance consequences. The following code, while innocent at first glance, will actually execute a SQL query for each salaried employee in the database (plus the first query) – the infamous ‘N+1 Query’ problem. As the number of items in ‘salariedEmployees’ grows, the number of roundtrips to the database increases.

public class OrganizationRepository
{
    private readonly OrganizationDbContext _context;
    public OrganizationRepository(OrganizationDbContext context)
    {
        this._context = context;
    }

    public List<string> GetDepartmentsWithSalariedEmployees()
    {
        //get all salaried employees (1 query)
        List<Employee> salariedEmployees = _context.Employees.Where(x => x.Salaried).ToList(); 

       //return department names for these employees (1 query for each 'Department' access)
        return salariedEmployees.Select(x => x.Department.Name).Distinct().ToList();
    }
}

There are other problems, too. If a repository returns an entity that happens to outlive the DbContext from which it came, it’s possible that accessing a navigation property will throw a runtime exception. Usually this isn’t a problem in web applications since the lifetime of the DbContext is tied to the lifetime of the web request (using your favorite DI container!). However, in other types of applications (services, etc.) this can become a problem.

public static void Main()
{
    Employee emp;
    using(OrganizationDbContext context = new OrganizationDbContext())
    {
        OrganizationRepository respository = new OrganizationRepository(context);
        emp = respository.GetEmployee(1);
    }

    var department = emp.Department; //explosion -- context is already disposed
}

Lastly, if an entity is eventually serialized as part of an API response, lazy-loading will cause every property to be sequentially loaded from the database as part of the serialization (since serialization must touch every property). Usually this will lead to run-time errors since entities typically have circular references; in other cases, you’re left with performance problems down the road.

Aside: as pointed out by my colleague, serialization of CLR objects with circular references can be accomplished using additional third-party libraries. See this link.

The above examples can all be written with lazy loading disabled. Here's what that looks like:

public class OrganizationRepositoryNonLazy
  {
      private readonly OrganizationDbContext _context;
      public OrganizationRepositoryNonLazy(OrganizationDbContext context)
      {
          this._context = context;
      }

      public string GetDeparmentNameForSalariedEmployee(int id)
      {
          Employee emp = _context.Employees
              .Include(x => x.Department) //Make sure EF brings back department as part of this query
              .SingleOrDefault(x => x.Id == id);

          if (emp.Salaried)
              //lazy loading disabled so no nav-property query here
              //Department is not null since we 'Include'd it above
              return emp.Department.Name;
          else
              throw new InvalidOperationException("Employee is not salaried");
      }

      public string GetDepartmentNameForEmployee(int id)
      {
          Employee emp = _context.Employees
              .Include(x => x.Department) //Make sure EF brings back department as part of this query
              .SingleOrDefault(x => x.Id == id);

          //lazy loading disabled so no nav-property query here
          //Department is not null since we 'Include'd it above
          return emp.Department.Name; 
      }

      public List<string> GetDepartmentsWithSalariedEmployees()
      {
          //get all salaried employees (1 query)
          List<Employee> salariedEmployees = 
              _context.Employees
              .Include(x=>x.Department)//Make sure EF brings back department for each employee
              .Where(x => x.Salaried).ToList(); 


          //lazy loading disabled so no nav-property query here
          //Department is not null since we 'Include'd it above
          return salariedEmployees.Select(x => x.Department.Name).Distinct().ToList();
      }
  }

There’s a lot of debate surrounding the usefulness of lazy loading. For most code, the additional cost incurred by disabling lazy loading is simply a few calls to ‘Include’ (as seen above). However, some applications depend heavily on this functionality. In one design pattern, the repository returns entities directly, and the calling code accesses the navigation properties to retrieve what is needed from the database. In my opinion, this defeats the point of using an RDBMS since the join operator is completely avoided, and all data access is boiled down to single-table queries. In addition, this pattern causes code that has nothing to do with the DAL to directly cause database roundtrips. This kind of code is also (once again) less performant than loading the needed information up front. As Mehdi Khalili notes in this excellent blog post:

“Using the same method to fetch [an entity], without knowing about the context in which it is going to be used, is not code reuse; it is poor design. If you have several contexts or different needs then you need to have several explicit paths in your code where you know everything you need in advance and can load it in one go”

Given the subtlety of the problems introduced by lazy loading and the limited upside, it’s best to disable it.

Prefer Projection over Manual Mapping

Using the same repository above, consider these two repository methods that return a data-transfer object (DTO) representation of an organization:

public class OrganizationSummary
{
    public string Name { get; set; }
    public List<DepartmentSummary> Departments { get; set; }

}
public class DepartmentSummary
{
    public string Name { get; set; }
    public int EmployeeCount { get; set; }
}

public class OrganizationRepository
{
    private readonly OrganizationDbContext _context;
    public OrganizationRepository(OrganizationDbContext context)
    {
        this._context = context;
    }

    //mapped after the fact
    public OrganizationSummary GetOrganizationSummary(int id)
    {
        //note:  the 'Include' line below is VERY important
        //if lazy loading is on, 'Include' forces eager loading of Departments and Employees
        //if lazy loading is off, Departments and Employees will not load without 'Include'
        var org = _context.Organizations
            .Include(x => x.Departments.SelectMany(y => y.Employees))
            .Where(x => x.Id == id).SingleOrDefault();

        return new OrganizationSummary()
        {
            Name = org.Name,
            Departments = org.Departments.Select(y => new DepartmentSummary()
            {
                Name = y.Name,
                EmployeeCount = y.Employees.Count //note that all employees are loaded into memory
            }).ToList()
        };
    }

    //using projection
    public OrganizationSummary GetOrganizationSummary2(int id)
    {
        return _context.Organizations.Where(x => x.Id == id)
                .Select(x => new OrganizationSummary()
                {
                    Name = x.Name,
                    Departments = x.Departments.Select(y => new DepartmentSummary()
                    {
                        Name = y.Name,
                        EmployeeCount = y.Employees.Count //here, Employees do *not* get loaded into memory.
                    }).ToList()
                }).SingleOrDefault();
    }

}

The first loads the entity (as well as everything else we need via ‘Include’) in the first step, and then maps the result to the DTO which is then returned. The second uses projection to map the query directly to the object. There are two advantages to the projection approach:

First, EF is smart enough to notice that we’re using only the count of the ‘Employees’ collection in the projection (rather than the employees themselves). The generated SQL will not bring back properties for employees since we don’t need them. The performance gain due to this fact can be enormous – especially if Departments have many Employees. In contrast, the first technique loads all of the Employees into memory (and, in fact, there is no way to avoid doing so).

Secondly, the second version is easier to maintain. Suppose a new navigation property is added to Department, and a corresponding property is added to DepartmentSummary. In the second version of this function, the projection simply needs to be modified to set the DTO’s property. In the first version, doing the same will result in a null reference exception (or an N+1 situation if lazy loading is enabled) if the programmer forgets to ‘Include’ the related entity.

DbContext, SaveChanges, and Dependency Injection

The management of DbContext in the context of business transactions is a difficult topic. Mehdi El Gueddari has written the best post I've seen on the subject; I won't duplicate his efforts, but I'll try to summarize the important points below and present a way to avoid most of the mess.

We're assuming a typical architecture like this in the following discussion, where the DI container has set DbContext's lifetime to 'InstancePerRequest'. Note that in this architecture, the DbContext associated with the web request is injected into both Repository1 and Repository2.

What's wrong with the 'usual' architecture?

Non-Web Contexts

In non-web projects such as services or console apps, there is no web request -- so configuring your DI to use 'InstancePerRequest' for your DbContexts will throw an exception. As a result, we must configure DbContext's lifetime to be singleton or transient. Singleton won't work since DbContext isn't thread safe and the cache will quickly go stale (it's generally advised that DbContext's lifetime be kept short). Configuring DbContext to be transient, or instance-per-dependency, will cause Repository1 and Repository2 to receive different instances of DbContext. This can cause all sorts of issues, including:

  1. Atomicity is lost since the DbContexts' units of work are separate
  2. Entities returned from Repository1 and passed to Repository2 will not be 'attached' to Repository2's DbContext, leading to strange lost updates/inserts/deletes
  3. Two connections to the database are opened, prompting escalation to a distributed transaction if TransactionScope is used (more on that later)

Your service classes shouldn't require a web context to function correctly. If a standalone Windows service or scheduled task eventually makes its way into the project, you will want to re-use the DAL, but ignoring the above points will keep you from doing so. Even if you are sure your project will never require a non-web component, there are still problems.

Web Contexts

In this scenario, all repositories work on the same instance of DbContext -- the one associated with the web request. There are two main issues in this scenario:

  1. Implicitly using a shared DbContext leads to confusion over who needs to call SaveChanges
  2. Implicitly reusing a shared DbContext across multiple calls to service methods can lead to unexpected side effects

For the first point above: The first solution you may consider involves having each repository SaveChanges after each of its public methods. The problem is that doing so prevents you from composing two or more repository methods in a single service call without losing atomicity (since SaveChanges is called more than once). This greatly reduces the reusability of the repository's code. In addition, each repository shares the same DbContext, so calling SaveChanges in one repository can (rather unexpectedly) save changes made by another.

The second solution involves injecting the DbContext into the service class, and calling SaveChanges from there instead of inside the repository. Although this solution does allow you to compose repository methods while maintaining atomicity, it's still unsatisfactory from a clarity standpoint -- the service object is calling SaveChanges on an object it didn't seem to make changes to. Note that this solution makes the situation much, much worse in the non-web context (where each service/repository gets its own instance of DbContext). In that context the repositories aren't calling SaveChanges at all, and the service class is saving an empty set of changes.

For the second point above: Suppose the controller calls service method A() that makes some database changes and a runtime exception is thrown after DbContext has been modified but before SaveChanges is called -- presumably, A() failed to update the database. The controller catches and logs the exception, and calls a different service method B(). Once B() calls SaveChanges, A()'s partially complete changes are incorrectly committed to the database.

This bug is a consequence of the service class reusing the same instance of DbContext to perform different operations and is very difficult to detect -- it only occurs in exceptional cases. In all other cases, A() is atomic.

A better way

The following points need to be addressed:

  • Repositories must use a shared DbContext within a given service method so joint operations can be atomic
  • Repeated calls to the service class must use separate DbContexts in order to avoid the 'partial completion' bug described in the above section (#2)
  • Repositories must not call SaveChanges (so their methods are composable)
  • The service class must work in web and non-web contexts
  • No hard coded dependencies

I've included an example architecture that accomplishes all of the above below:

public class OrganizationSummary
public class ServiceClass
{
    private readonly OrganizationDbContextFactory _contextFactory;
    private readonly OrganizationRepositoryFactory _orgFactory;
    private readonly LogRepositoryFactory _logFactory;

    //factories for the DbContext and each needed repository.
    //This enables us to explicitly control creation of a per-operation context that's used by all repositories
    //without relying on 'Singleton' or 'Instance per Request'
    public ServiceClass(OrganizationDbContextFactory contextFactory, OrganizationRepositoryFactory orgFactory, LogRepositoryFactory logFactory)
    {
        _contextFactory = contextFactory;
        _orgFactory = orgFactory;
        _logFactory = logFactory;
    }
    public void Save()
    {
        using(var context = _contextFactory.Create())
        {
            //each repository works on the same context, and only within this block
            var logRepo = _logFactory.Create(context);
            var orgRepo = _orgFactory.Create(context);

            //neither of these calls SaveChanges, so they're safe to compose
            orgRepo.SetEmployeeSalaried(1);
            logRepo.LogMessage("INFO", "Employee 1 is now salaried");
            
            //one and only one call to SaveChanges
            //note that if an exception is thrown in this block, 
            //SaveChanges is not called and the DbContext is disposed, safely rejecting the in-progress changes
            context.SaveChanges(); 
        }
    }
}

public class OrganizationRepository : IOrganizationRepository
{
    private readonly OrganizationDbContext _context;
    private readonly OtherDependencies _dependencies;

    public OrganizationRepository(OrganizationDbContext context, OtherDependencies dependencies)
    {
        this._context = context;
        this._dependencies = dependencies;
    }

    //note there is *no* call to SaveChanges, so this method is composable with others
    public void SetEmployeeSalaried(int id)
    {
        var employee = _context.Employees.Find(id);
        employee.Salaried = true;
    }
}
public class LogRepository : ILogRepository
{
    private readonly OrganizationDbContext _context;
    private readonly OtherDependencies _dependencies;

    public LogRepository(OrganizationDbContext context, OtherDependencies dependencies)
    {
        this._context = context;
        this._dependencies = dependencies;
    }

    //note there is *no* call to SaveChanges, so this method is composable with others
    public void LogMessage(string severity, string message)
    {
        Log log = new Log();
        log.Severity = severity;
        log.Message = message;
        _context.Logs.Add(log);
    }
}

//factories for mockability, partial injection into repositories
public class OrganizationDbContextFactory : IOrganizationDbContextFactory
{
    public IOrganizationDbContext Create()
    {
        return new OrganizationDbContext();
    }
}
public class OrganizationRepositoryFactory : IOrganizationRepositoryFactory
{
    private readonly OtherDependencies _otherDependencies;

    public OrganizationRepositoryFactory(OtherDependencies otherDependencies)
    {
        _otherDependencies = otherDependencies;
    }
    public IOrganizationRepository Create(OrganizationDbContext context)
    {
        return new OrganizationRepository(context, _otherDependencies);
    }
}
public class LogRepositoryFactory : ILogRepositoryFactory
{
    private readonly OtherDependencies _otherDependencies;

    public LogRepositoryFactory(OtherDependencies otherDependencies)
    {
        _otherDependencies = otherDependencies;
    }
    public ILogRepository Create(OrganizationDbContext context)
    {
        return new LogRepository(context, _otherDependencies);
    }
}

The service class creates the DbContext on a per-operation basis using a mockable DbContextFactory. It then passes the DbContext through the (also mockable) repository factories in order to create repositories that will work on that DbContext.

Since the service class explicitly controls the DbContext, it can take responsibility for calling SaveChanges once and only once, satisfying the atomicity requirement above. This also means the repositories don't have to call SaveChanges, allowing us to compose their methods as needed.

Each DbContext is local to its own method, so no partial update bugs. All dependencies can be set to either 'Transient' or 'Singleton' in this code (doesn't matter), so it will work in both contexts, and there aren't any hard coded dependencies.

Although this approach satisfies all of our objectives above, it isn't perfect. As it currently stands there is no way to combine multiple service methods into a single transaction since the creation of DbContext prompts the opening of a connection.

In addition, since we're explicitly disposing DbContext as soon as the service method has completed, the service method shouldn't return any Entities if lazy loading is enabled. Trying to access a navigation property while the DbContext is disposed will throw a runtime exception (as discussed earlier).

Avoid TransactionScope

Prior to EF 6, the only way to achieve atomicity across multiple calls to ‘SaveChanges’ was using TransactionScope:

public class OrganizationRepository
{
    private readonly OrganizationDbContext _context;
    public OrganizationRepository(OrganizationDbContext context)
    {
        this._context = context;
    }

    public void DoSomething()
    {
        //the following two calls to SaveChanges are wrapped in a transaction
        using(TransactionScope scope = new TransactionScope())
        {
            _context.Employees.Find(1).Salaried = true;
            _context.SaveChanges();

            _context.Employees.Find(2).Salaried = false;
            _context.SaveChanges();

            scope.Complete();
        }
    }
}

This approach has a few problems. By design, TransactionScope can detect when a local transaction isn’t sufficient for atomicity and automatically escalates the transaction to a ‘distributed transaction’. A distributed transaction is designed to span more than one machine, and is very expensive. Unfortunately, it is surprisingly easy to inadvertently write code that triggers this escalation.

In SQL Server 2005, very simple code (Update 4 in the link) can cause TransactionScope to escalate from a local transaction to a distributed transaction. This has been fixed in SQL Server 2008 and above.

Any time you've got more than one DbContext (more than one database), and you accidentally update both within the same TransactionScope, you get an escalation -- even if this wasn't explicitly your intention. I've done this personally when adding logging to an external database to existing service methods.

Since distributed transactions are uncommon, it's better to use DbContext.Database.BeginTransaction() (example below) which always creates a lightweight local transaction. Only switch to TransactionScope if you are absolutely sure you need a distributed transaction.

public class OrganizationRepository
{
    private readonly OrganizationDbContext _context;
    public OrganizationRepository(OrganizationDbContext context)
    {
        this._context = context;
    }

    public void DoSomething()
    {
        //the following two calls to SaveChanges are wrapped in a transaction
        using (var trans = _context.Database.BeginTransaction())
        {
            _context.Employees.Find(1).Salaried = true;
            _context.SaveChanges();

            _context.Employees.Find(2).Salaried = false;
            _context.SaveChanges();

            trans.Commit();
        }
    }
}

Lastly, TransactionScope will throw exceptions on disposal in asynchronous contexts if TransactionScopeAsyncFlowOption.Enabled isn't set, whereas BeginTransaction() doesn't have this problem.

Wrapping Up

When used improperly, Entity Framework can cause subtle defects that aren't detected until late in the product's lifecycle. Since many apps' central concern is data, it pays to carefully architect the code surrounding your data to be as reliable, reusable, and performant as possible. With these guidelines in mind, you'll avoid many of the traps of Entity Framework while still leveraging its full potential.

Big thanks to Tom Straub, Jonathan Gardner, and Adam Howard for lending their guidance and experience for this post.

Share:

About The Author

App Dev Consultant

Jake is a .NET application development consultant in Cardinal's Cincinnati office with a passion for back-end technology. He’s had success working with various parts of the Microsoft stack, including MVC/Web API, SignalR, and SQL Server.