|
Chapter 2: Web Forms Internals
2 Web Forms InternalsFew things are harder to put up with than the annoyance of a good example.Mark Twain ASP.NET pages are dynamically compiled on demand when first required in the context of a Web application. Dynamic compilation is not specific to ASP.NET pages (.aspx files); it also occurs with Web Services (.asmx files), Web user controls (.ascx files), HTTP handlers (.ashx files), and ASP.NET application files such as the global.asax file. But what does it mean exactly that an ASP.NET page is compiled? How does the ASP.NET runtime turn the source code of an .aspx file into a .NET Framework compilable class? And what becomes of the dynamically created assembly when the associated source file gets updated? And finally, what happens once a compiled assembly has been associated with the requested .aspx URL? Don't be too surprised to find all these questions crowding your mind at this time. Their presence indicates you're on the right track and ready to learn more about the underpinnings of the Web Forms programming model.
Executing ASP.NET PagesThe expression compiled page is at once precise as well as vague and generic. It is precise because it tells you exactly what happens when a URL with an .aspx extension is requested. It is vague because it doesn't specify which module launches and controls the compiler and what actual input the compiler receives on the command line. Finally, it is generic because it omits a fair number of details.In this chapter, we simply aim to unveil all the mysteries fluttering around the dynamic compilation of ASP.NET pages. We'll do this by considering the actions performed on the Web server, and which modules perform them, when a request arrives for an .aspx page.
The IIS Resource MappingsAll resources you can access on an IIS-based Web server are grouped by their file extension. Any incoming request is then assigned to a particular run-time module for actual processing. Modules that can handle Web resources within the context of IIS are ISAPI extensionsthat is, plain-old Win32 DLLs that expose, much like an interface, a bunch of API functions with predefined names and prototypes. IIS and ISAPI extensions use these DLL entries as a sort of private communication protocol. When IIS needs an ISAPI extension to accomplish a certain task, it simply loads the DLL and calls the appropriate function with valid arguments. Although the ISAPI documentation doesn't mention an ISAPI extension as an interface, it is just thata module that implements a well-known programming interface.When the request for a resource arrives, IIS first verifies the type of the resource. Static resources such as images, text files, HTML pages, and scriptless ASP pages are resolved directly by IIS without the involvement of external modules. IIS accesses the file on the local Web server and flushes its contents to the output console so that the requesting browser can get it. Resources that require server-side elaboration are passed on to the registered module. For example, ASP pages are processed by an ISAPI extension named asp.dll. In general, when the resource is associated with executable code, IIS hands the request to that executable for further processing. Files with an .aspx extension are assigned to an ISAPI extension named aspnet_isapi.dll, as shown in Figure 2-1. Figure 2-1 The IIS application mappings for resources with an .aspx extension. Just like any other ISAPI extension, aspnet_isapi.dll is hosted by the IIS 5.0 processthe executable named inetinfo.exe. Resource mappings are stored in the IIS metabase. Upon installation, ASP.NET modifies the IIS metabase to make sure that aspnet_isapi.dll can handle all the resources identified by the extensions listed in Table 2-1. Table 2-1 IIS Application Mappings for aspnet_isapi.dll
In addition, the aspnet_isapi.dll extension handles other typical Microsoft Visual Studio .NET extensions such as .cs, .csproj, .vb, .vbproj, .licx, .config, .resx, .webinfo, and .vsdisco. Other extensions added with Visual Studio .NET 2003 for J# projects are .java, .jsl, .resources, .vjsproj. The ASP.NET ISAPI extension doesn't process the .aspx file but acts as a dispatcher. It collects all the information available about the invoked URL and the underlying resource, and it routes the request toward another distinct processthe ASP.NET worker process.
The ASP.NET Worker ProcessThe ASP.NET worker process represents the ASP.NET runtime environment. It consists of a Win32 unmanaged executable named aspnet_wp.exe, which hosts the .NET common language runtime (CLR). This process is the executable you need to attach to in order to debug ASP.NET applications. The ASP.NET worker process activates the HTTP pipeline that will actually process the page request. The HTTP pipeline is a collection of .NET Framework classes that take care of compiling the page assembly and instantiating the related page class.The connection between aspnet_isapi.dll and aspnet_wp.exe is established through a named pipea Win32 mechanism for transferring data over process boundaries. As the name suggests, a named pipe works like a pipe: you enter data in one end, and the same data comes out the other end. Pipes can be established both to connect local processes and processes running on remote machines. Figure 2-2 illustrates the ASP.NET layer built on top of IIS. Figure 2-2 IIS receives page requests and forwards them to the ASP.NET runtime. How the ASP.NET Runtime Works A single copy of the worker process runs all the time and hosts all the active Web applications. The only exception to this situation is when you have a Web server with multiple CPUs. In this case, you can configure the ASP.NET runtime so that multiple worker processes run, one for each available CPU. A model in which multiple processes run on multiple CPUs in a single server machine is known as a Web garden and is controlled by attributes on the <processModel> section in the machine.config file. (I'll cover ASP.NET configuration files in Chapter 12.) When a single worker process is used by all CPUs and controls all Web applications, it doesn't necessarily mean that no process isolation is achieved. Each Web application is, in fact, identified with its virtual directory and belongs to a distinct application domain, commonly referred to as an AppDomain. A new AppDomain is created within the ASP.NET worker process whenever a client addresses a virtual directory for the first time. After creating the new AppDomain, the ASP.NET runtime loads all the needed assemblies and passes control to the HTTP pipeline to actually service the request. If a client requests a page from an already running Web application, the ASP.NET runtime simply forwards the request to the existing AppDomain associated with that virtual directory. All the assemblies needed to process the page are now ready to use because they were compiled upon the first call. Figure 2-3 provides a more general view of the ASP.NET runtime. Figure 2-3 The ASP.NET runtime and the various AppDomains.
Process Recycling The behavior and performance of the ASP.NET worker process is constantly monitored to catch any form of decay as soon as possible. Parameters used to evaluate the performance include the number of requests served and queued, the total life of the process, and the percentage of physical memory (60% by default) it can use. The <processModel> element of the machine.config file defines threshold values for all these parameters. The aspnet_isapi.dll checks the overall state of the current worker process before forwarding any request to it. If the process breaks one of these measures of good performance, a new worker process is started to serve the next request. The old process continues running as long as there are requests pending in its own queue. After that, when it ceases to be invoked, it goes into idle mode and is then shut down. This automatic scavenging mechanism is known as process recycling and is one of the aspects that improve the overall robustness and efficiency of the ASP.NET platform. In this way, in fact, memory leaks and run-time anomalies are promptly detected and overcome. Process Recycling in IIS 6.0 Process recycling is also a built-in feature of IIS 6.0 that all types of Web applications, including ASP.NET and ASP applications, automatically take advantage of. More often than not and in spite of the best efforts to properly build them, Web applications leak memory, are poorly coded, or have other run-time problems. For this reason, administrators will periodically encounter situations that necessitate rebooting or restarting a Web server. Up to the release of IIS 6.0, restarting a Web site required interrupting the entire Web server. In IIS 6.0, all user code is handled by worker processes, which are completely isolated from the core Web server. Worker processes are periodically recycled according to the number of requests they served, the memory occupied, and the time elapsed since activation. Worker processes are also automatically shut down if they appear to hang or respond too slowly. An ad hoc module in IIS 6.0 takes care of replacing faulty processes with fresh new ones.
Configuring the ASP.NET Worker Process The aspnet_isapi module controls the behavior of the ASP.NET worker process through a few parameters. Table 2-2 details the information that gets passed between the ASP.NET ISAPI extension and the ASP.NET worker process. Table 2-2 Parameters of the ASP.NET Process
Default values for the arguments in Table 2-2 can be set by editing the attributes of the <processModel> section in the machine.config file. (I'll cover the machine.config file in more detail in Chapter 12.) These parameters instruct the process how to perform tasks that need to happen before the CLR is loaded. Setting COM security is just one such task, and that's why authentication-level values need to be passed to the ASP.NET worker process. What does ASP.NET have to do with COM security? Well, the CLR is actually exposed as a COM object. (Note that the CLR itself is not made of COM code, but the interface to the CLR is a COM object.) Other parameters are the information needed to hook up the named pipes between the ISAPI extension and the worker process. The names for the pipes are generated randomly and have to be communicated. The worker process retrieves the names of the pipes by using the parent process ID (that is, the IIS process) and the number of pipes created.
About the Web Garden Model The This-Process-Unique-ID parameter is associated with Web garden support. When multiple worker processes are used in a Web garden scenario, the aspnet_isapi.dll needs to know which process it's dealing with. Any HTTP request posted to the pipe must address a precise target process, and this information must be written into the packet sent through the pipe. The typical way of identifying a process is by means of its process ID. Unfortunately, though, aspnet_isapi.dll can't know the actual ID of the worker process being spawned because the ID won't be determined until the kernel is done with the CreateProcess API function. The following pseudocode demonstrates that the [process_id] argument of aspnet_wp.exe can't be the process ID of the same process being created!
// aspnet_isapi.dll uses this code to create a worker process For this reason, aspnet_isapi.dll generates a unique but fake process ID and uses that ID to uniquely identify each worker process running on a multiprocessor machine configured as a Web garden. In this way, the call we just saw is rewritten as follows:
// [This-Process-Unique-ID] is a unique GUID The worker process caches the This-Process-Unique-ID argument and uses it to recognize which named-pipe messages it has to serve.
The ASP.NET HTTP PipelineThe ASP.NET worker process is responsible for running the Web application that lives behind the requested URL. It passes any incoming HTTP requests to the so-called HTTP pipelinethat is, the fully extensible chain of managed objects that works according to the classic concept of a pipeline. Unlike ASP pages, ASP.NET pages are not simply parsed and served to the user. While serving pages is the ultimate goal of ASP.NET, the way in which the resultant HTML code is generated is much more sophisticated than in ASP and involves many more objects.A page request passes through a pipeline of objects that process the HTTP content and, at the end of the chain, produce some HTML code for the browser. The entry point in this pipeline is the HttpRuntime class. The ASP.NET runtime activates the HTTP pipeline by creating a new instance of the HttpRuntime class and then calling the method ProcessRequest. The HttpRuntime Object Upon creation, the HttpRuntime object initializes a number of internal objects that will help carry out the page request. Helper objects include the cache manager and the file system monitor used to detect changes in the files that form the application. When the ProcessRequest method is called, the HttpRuntime object starts working to serve a page to the browser. It creates a new context for the request and initializes a specialized text writer object in which the HTML code will be accumulated. A context is given by an instance of the HttpContext class, which encapsulates all HTTP-specific information about the request. The text writer is an instance of the HttpWriter class and is the object that actually buffers text sent out through the Response object. After that, the HttpRuntime object uses the context information to either locate or create a Web application object capable of handling the request. A Web application is searched using the virtual directory information contained in the URL. The object used to find or create a new Web application is HttpApplicationFactoryan internal-use object responsible for returning a valid object capable of handling the request. The Application Factory During the lifetime of the application, the HttpApplicationFactory object maintains a pool of HttpApplication objects to serve incoming HTTP requests. When invoked, the application factory object verifies that an AppDomain exists for the virtual folder the request targets. If the application is already running, the factory picks an HttpApplication out of the pool of available objects and passes it the request. A new HttpApplication object is created if an existing object is not available. If the virtual folder has not yet been called, a new HttpApplication object for the virtual folder is created in a new AppDomain. In this case, the creation of an HttpApplication object entails the compilation of the global.asax application file, if any is present, and the creation of the assembly that represents the actual page requested. An HttpApplication object is used to process a single page request at a time; multiple objects are used to serve simultaneous requests for the same page.
The HttpApplication Object HttpApplication is a global.asax-derived object that the ASP.NET worker process uses to handle HTTP requests that hit a particular virtual directory. A particular HttpApplication instance is responsible for managing the entire lifetime of the request it is assigned to, and the instance of HttpApplication can be reused only after the request has been completed. The HttpApplication class defines the methods, properties, and events common to all application objects.aspx pages, user controls, Web services, and HTTP handlerswithin an ASP.NET application. The HttpApplication maintains a list of HTTP module objects that can filter and even modify the content of the request. Registered modules are called during various moments of the elaboration as the request passes through the pipeline. HTTP modules represent the managed counterpart of ISAPI filters and will be covered in greater detail in Chapter 23. The HttpApplication object determines the type of object that represents the resource being requestedtypically, an ASP.NET page. It then uses a handler factory object to either instantiate the type from an existing assembly or dynamically create the assembly and then an instance of the type. A handler factory object is a class that implements the IHttpHandlerFactory interface and is responsible for returning an instance of a managed class that can handle the HTTP requestan HTTP handler. An ASP.NET page is simply a handler objectthat is, an instance of a class that implements the IHttpHandler interface.
The Handler Factory The HttpApplication determines the type of object that must handle the request, and it delegates the type-specific handler factory to create an instance of that type. Let's see what happens when the resource requested is a page. Once the HttpApplication object in charge of the request has figured out the proper handler, it creates an instance of the handler factory object. For a request that targets a page, the factory is an undocumented class named PageHandlerFactory.
The page handler factory is responsible for either finding the assembly that contains the page class or dynamically creating an ad hoc assembly. The System.Web namespace defines a few handler factory classes. These are listed in Table 2-3. Table 2-3 Handler Factory Classes in the .NET Framework
Bear in mind that handler factory objects do not compile the requested resource each time it is invoked. The compiled code is stored in a directory on the Web server and used until the corresponding resource file is modified. So the page handler factory creates an instance of an object that represents the particular page requested. This object inherits from the System.Web.UI.Page class, which in turn implements the IHttpHandler interface. The page object is built as an instance of a dynamically generated class based on the source code embedded in the .aspx file. The page object is returned to the application factory, which passes that back to the HttpRuntime object. The final step accomplished by the ASP.NET runtime is calling the ProcessRequest method on the page object. This call causes the page to execute the user-defined code and generate the HTML text for the browser. Figure 2-5 illustrates the overall HTTP pipeline architecture. Figure 2-5 The HTTP pipeline processing for a page.
The ASP.NET Page Factory ObjectLet's examine in detail how the .aspx page is converted into a class and compiled into an assembly. Generating an assembly for a particular .aspx resource is a two-step process. First, the source code for the class is created by merging the content of the <script> section with the code-behind file, if any. Second, the dynamically generated class is compiled into an assembly and cached in a well-known directory.Locating the Assembly for the Page Assemblies generated for ASP.NET pages are cached in the Temporary ASP.NET Files folder. The path for version 1.1 of the .NET Framework is as follows.
%SystemRoot%\Microsoft.NET\Framework\v1.1.4322\Temporary ASP.NET Files Of course, the directory depends on the version of the .NET Framework you installed. The directory path for version 1.0 of the .NET Framework includes a subdirectory named v1.0.3705. The Temporary ASP.NET Files folder has one child directory for each application ever executed. The name of the subfolder matches the name of the virtual directory of the application. Pages that run from the Web server's root folder are grouped under the Root subfolder. Page-specific assemblies are cached in a subdirectory placed a couple levels down the virtual directory folder. The names of these child directories are fairly hard to make sense of. Names are the result of a hash algorithm based on some randomized factor along with the application name. A typical path is shown in the following listing. The last two directories (in boldface) have fake but realistic names.
\Framework Regardless of the actual algorithm implemented to determine the folder names, from within an ASP.NET application the full folder path is retrieved using the following, pretty simple, code:
string tempAspNetDir = HttpRuntime.CodegenDir; So much for the location of the dynamic assembly. So how does the ASP.NET runtime determine the assembly name for a particular .aspx page? The assembly folder contains a few XML files with a particular naming convention:
[filename].[hashcode].xml If the page is named, say, default.aspx, the corresponding XML file can be named like this:
default.aspx.2cf84ad4.xml The XML file is created when the page is compiled. This is the typical content of this XML file:
<preserve assem="c5gaxkyh" type="ASP.Default_aspx" I'll say more about the schema of the file in a moment. For now, it will suffice to look at the assem attribute. The attribute value is just the name of the assembly (without extension) created to execute the default.aspx page. Figure 2-6 provides a view of the folder. Figure 2-6 Temporary ASP.NET Files: a view of interiors. The file c5gaxkyh.dll is the assembly that represents the default.aspx page. The other assembly is the compiled version of the global.asax file. (If not specified, a standard global.asax file is used.) The objects defined in these assemblies can be viewed with any class browser tool, including Microsoft IL Disassembler, ILDASM.exe.
Detecting Page Changes As mentioned earlier, the dynamically compiled assembly is cached and used to serve any future request for the page. However, changes made to an .aspx file will automatically invalidate the assembly, which will be recompiled to serve the next request. The link between the assembly and the source .aspx file is kept in the XML file we mentioned a bit earlier. Let's recall it:
<preserve assem="c5gaxkyh" type="ASP.Default_aspx" hash="fffffeda266fd5f7"> The name attribute of the <filedep> node contains just the full path of the file associated with the assembly whose name is stored in the assem attribute of the <preserve> node. The type attribute, on the other hand, contains the name of the class that renders the .aspx file in the assembly. The actual object running when, say, default.aspx is served is an instance of a class named ASP.Default_aspx. Based on the Win32 file notification change system, this ASP.NET feature enables developers to quickly build applications with a minimum of process overhead. Users, in fact, can "just hit Save" to cause code changes to immediately take effect within the application. In addition to this development-oriented benefit, deployment of applications is greatly enhanced by this feature, as you can simply deploy a new version of the page that overwrites the old one. When a page is changed, it's recompiled as a single assembly, or as part of an existing assembly, and reloaded. ASP.NET ensures that the next user will be served the new page outfit by the new assembly. Current users, on the other hand, will continue viewing the old page served by the old assembly. The two assemblies are given different (because they are randomly generated) names and therefore can happily live side by side in the same folder as well as be loaded in the same AppDomain. Because that was so much fun, let's drill down a little more into this topic. How ASP.NET Replaces Page Assemblies When a new assembly is created for a page as the effect of an update, ASP.NET verifies whether the old assembly can be deleted. If the assembly contains only that page class, ASP.NET attempts to delete the assembly. Often, though, it finds the file loaded and locked, and the deletion fails. In this case, the old assembly is renamed by adding a .DELETE extension. (All executables loaded in Windows can be renamed at any time, but they cannot be deleted until they are released.) Renaming an assembly in use is no big deal in this case because the image of the executable is already loaded in memory and there will be no need to reload it later. The file, in fact, is destined for deletion. Note that .DELETE files are cleaned up when the directory is next accessed in sweep mode, so to speak. The directory, in fact, is not scavenged each time it is accessed but only when the application is restarted or an application file (global.asax or web.config) changes. Each ASP.NET application is allowed a maximum number of recompiles (with 15 as the default) before the whole application is restarted. The threshold value is set in the machine.config file. If the latest compilation exceeds the threshold, the AppDomain is unloaded and the application is restarted. Bear in mind that the atomic unit of code you can unload in the CLR is the AppDomain, not the assembly. Put another way, you can't unload a single assembly without unloading the whole AppDomain. As a result, when a page is recompiled, the old version stays in memory until the AppDomain is unloaded because either the Web application exceeded its limit of recompiles or the ASP.NET worker process is taking up too much memory.
Batch Compilation Compiling an ASP.NET page takes a while. So even though you pay this price only once, you might encounter situations in which you decide it's best to happily avoid that. Unfortunately, as of version 1.1, ASP.NET lacks a tool (or a built-in mechanism) to scan the entire tree of a Web application and do a precompilation of all pages. However, you can always request each page of a site before the site goes live or, better yet, have an ad hoc application do it. In effect, since version 1.0, ASP.NET has supported batch compilation, but this support takes place only at run time. ASP.NET attempts to batch into a single compiled assembly as many pages as possible without exceeding the configured maximum batch size. Furthermore, batch compilation groups pages by language, and it doesn't group in the same assembly pages that reside in different directories. Just as with many other aspects of ASP.NET, batch compilation is highly configurable and is a critical factor for overall site performance. Fine-tuning the related parameters in the <compilation> section of the machine.config file is important and should save you from having and loading 1000 different assemblies for as many pages or from having a single huge assembly with 1000 classes inside. Notice, though, that the problem here is not only with the size and the number of the assemblies but also with the time needed to recompile the assemblies in case of updates.
How ASP.NET Creates a Class for the Page An ASP.NET page executes as an instance of a type that, by default, inherits from System.Web.UI.Page. The page handler factory creates the source code for this class by putting a parser to work on the content of the physical .aspx file. The parser produces a class written with the language the developer specified. The class belongs to the ASP namespace and is given a file-specific name. Typically, it is the name and the extension of the file with the dot (.) replaced by an underscore (_). If the page is default.aspx, the class name will be ASP.Default_aspx. You can check the truthfulness of this statement with the following simple code:
void Page_Load(object sender, EventArgs e) As mentioned earlier, when the page runs with the Debug attribute set to true, the ASP.NET runtime does not delete the source code used to create the assembly. Let's have a quick look at the key parts of the source code generated. (Complete sources are included in this book's sample code.)
Reviewing the Class Source Code For a better understanding of the code generated by ASP.NET, let's first quickly review the starting pointthe .aspx source code:
<%@ Page Language="C#" Debug="true" %> The following listing shows the source code that ASP.NET generates to process the preceding page. The text in boldface type indicates code extracted from the .aspx file:
namespace ASP
All the controls in the page marked with the runat attribute are rendered as protected properties of the type that corresponds to the tag. Those controls are instantiated and initialized in the various __BuildControlXXX methods. The initialization is done using the attributes specified in the .aspx page. The build method for the form adds child-parsed objects to the HtmlForm instance. This means that all the parent-child relationships between the controls within the form are registered. The __BuildControlTree method ensures that all controls in the whole page are correctly registered with the page object. All the members defined in the <script> block are copied verbatim as members of the new class with the same level of visibility you declared. The base class for the dynamically generated source is Page unless the code-behind approach is used. In that case, the base class is just the code-behind class. We'll return to this later in "The Code-Behind Technique" section.
Processing the Page RequestThe HttpRuntime object governs the HTTP pipeline in which a page request is transformed into a living instance of a Page-derived class. The HttpRuntime object causes the page to generate its HTML output by calling the ProcessRequest method on the Page-derived class that comes out of the pipeline. ProcessRequest is a method defined on the IHttpHandler interface that the Page class implements.The Page Life Cycle Within the base implementation of ProcessRequest, the Page class first calls the FrameworkInitialize method, which, as seen in the source code examined a moment ago, builds the controls tree for the page. Next, ProcessRequest makes the page go through various phases: initialization, loading of view-state information and postback data, loading of the page's user code, and execution of postback server-side events. After that, the page enters rendering mode: the updated view state is collected, and the HTML code is generated and sent to the output console. Finally, the page is unloaded and the request is considered completely served. During the various phases, the page fires a few events that Web controls and user-defined code can intercept and handle. Some of these events are specific to controls and can't be handled at the level of the .aspx code. In theory, a page that wants to handle a certain event should explicitly register an appropriate handler. However, for backward compatibility with the Visual Basic programming style, ASP.NET also supports a form of implicit event hooking. By default, the page tries to match method names and events and considers the method a handler for the event. For example, a method named Page_Load is the handler for the page's Load event. This behavior is controlled by the AutoEventWireup attribute on the @Page directive. If the attribute is set to false, any applications that want to handle an event need to connect explicitly to the page event. The following code shows how to proceed from within a page class:
// C# code By proceeding this way, you will enable the page to get a slight performance boost by not having to do the extra work of matching names and events. Visual Studio .NET disables the AutoEventWireup attribute. Page-Related Events To handle a page-related event, an ASP.NET page can either hook up the event (for example, Load) or, in a derived class, override the corresponding methodfor example, OnLoad. The second approach provides for greater flexibility because you can decide whether and when to call the base method, which, in the end, fires the event to the user-code. Let's review in detail the various phases in the page life cycle:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||