|
Do you find the Support WebCast transcripts helpful? Microsoft Support WebCast MIME-Type Handling in Microsoft Internet Explorer January 11, 2001 Note: This document is based on the original spoken WebCast transcript. It has been edited for clarity. Heidi Moeller: Hello. Welcome to the Microsoft Support WebCast Program. We would like to thank all of you for joining us today. Our topic will be "MIME Type Handling in Microsoft Internet Explorer," and our presenter will be Dan Mohr. I am Heidi Moeller, and I will be hosting today's session. We will start this session with Dan's presentation and follow it up with a question-and-answer period when the presentation is finished. We only answer questions submitted during the live broadcast. The Q&A portion of the Support WebCast is intended to encourage further discussion of the Support WebCast topic; however, one-on-one product support issues are outside the scope of what we are able to address during the Support WebCast. I would like to now take just a moment to introduce Dan. Dan Mohr is a Support Engineer with the Microsoft® Developer Support Internet Client team. He is currently the plug-in and browser extension specialist. Thank you so much for joining us today, Dan. Let's get started. Dan Mohr: Thanks Heidi. Today, I am going to be talking to you a little bit about MIME type handling in Microsoft Internet Explorer. I would like to start off by moving to our slide discussing what will be covered (slide 2), to give you a little outline for the presentation. First of all, I am going to be doing a little background on MIME; the history of it, the standardization process, etc, and then we are going to move on to talk about Internet Explorer's MIME type detection. We will be answering the question, "How does Internet Explorer determine what kind of data the server is sending it?" Then, we will move on to the Internet Explorer MIME type handling and answer the question, "How does Internet Explorer know what to do with the data after it has it?" Next, I would like to go through a little bit on what won't be covered (slide 3). First of all, we won't be covering MIME types as they are used in e-mail programs, that is a definitely enough for a separate presentation. We also won't be talking about MIME types as they relate to Asynchronous Pluggable Protocol Handlers, or as they relate to MIME filters, temporary and permanent. We will be talking a little about how Internet Explorer's default pluggable protocol handler for HTTP deals with MIME types. If you want more information on pluggable protocol handlers or MIME filters, you will want to check out the MSDN® Online Web Workshop at http://msdn.microsoft.com/workshop/networking. I will go over that link again at the end of the presentation. Also, we will be talking about registering MIME viewers, but we won't talk about the internals of creating a MIME viewer. We also won't be talking about setting up MIME types on the server. We will just assume that you have that handled already. Moving on to the next slide (slide 4), "What Is MIME?" I will be saying that acronym a lot today, so I thought I would at least give you a definition. MIME stands for Multipurpose Internet Mail Extensions, and it is basically a standard for rich e-mail content. It was created in 1992. Prior to that, e-mail messages were of a fixed length, all text, and just didn't reflect the state of multimedia computing back in the day. MIME was defined in RFC 1341. In keeping with the tradition of the Internet, new standards are often proposed as RFCs, or Requests For Comments. This document has gone through several revisions over time. The most current version actually spans five RFCs, RFCs 2045 through 2049. RFCs are published by the Internet Engineering Task Force. Moving on to the next slide (slide 5), MIME is constructed around the MIME type, or media type. Those terms are used interchangeably, both in the standards and by me. So, just be aware of that. MIME types consist of a top-level type followed by a subtype. The two are delimited by a forward slash (/), as you can see. Top-level types are very broad and they are very standardized. Examples of these include text, application, audio, video, image, and so on. There is also a message standard top-level type, as well as a multipart top-level type. The multipart is a bit of a special case because it defines a MIME message, which contains portions of different MIME types. It can contain a plain text portion as well as an image portion. New top-level types require an RFC to be published along the standards track. And, for the most part, you would normally only create a subtype. An example is application/msword. Another convention for creating subtypes is to precede your subtype with x-. For example, application/x-test. Moving on to the next slide (slide 6), this is all well and good for e-mail programs, but in the world of HTTP, how does this process work? Although many constructs of MIME are used in HTTP, HTTP is not a MIME-compliant protocol. There is a special section of the HTTP specification, section 19.4, which discusses some of the differences between the two protocols, areas in which they differ, and some things that HTTP does not support. You can follow up on that for more information. In HTTP, the media-type or type information is communicated in the headers. There are a number of these headers that participate in the MIME process. The two most important are Content-Type and Content-Disposition. There are others, such as Content-Enoding, and the Accept headers sent from the client. All these do play a role, but by far the two most important headers are Content-Type and Content-Disposition. For a little more information on media types in HTTP, Section 3.7 of the HTTP specification will provide you with a little more background. I would like to just focus on those two headers to give you a little more specific information about each of them. Moving on to the Content-Type header (slide 7), this header is defined in the HTTP specification in Section 14.17. The Content-Type header is the method by which the server will suggest a MIME type to the client. Although it may be logical that the server would verify or provide a certain MIME type to the client, Internet Explorer treats these types merely as suggestions, as we will see later. The form of the Content-Type header is pretty analogous to the MIME type format, with a type/subtype, although it does include the ability to have additional parameters, and these are delimited by semicolons (;). Most commonly, in HTTP, you will see that the parameter is a char set, or character set, used for globalization purposes. Moving on to the Content-Disposition header (slide 8). The Content-Disposition header is not defined in the HTTP specifications, although it is mentioned in Section 19.5.1, the extent of which is basically to explain that it is not part of the specification, but is instead defined in RFC 1806; that is the original RFC. The most recent update is RFC 2183. The Content-Disposition header allows a server to suggest to the client a presentation behavior, such as inline or attachment. Inline is "Please display this data within the frames of the browser." Attachment is "Please prompt the user to open or save this file." The last bullet shows an example of what this header would look in action. You specify the presentation behavior followed by a semicolon, and then you can suggest to the client a file name. This file name is extremely important, as we will see later. How would you add this header on the server? Well, in Microsoft Internet Information Server (IIS), you can specify, on a per-directory basis, that this header be added to the outgoing stream. In Active Server Pages (ASP) code, you can use Response.AddHeader. This is explained in more detail in the Microsoft Knowledge Base article Q260519. This article explains how to raise a File Download dialog box for a known MIME type. We will be revisiting the Content-Disposition header later on, after we go through a little more background on type handling in Internet Explorer. Moving on the next slide (slide 9), we will start our discussion of MIME type detection, with a short foray into when and where this occurs. As a brief aside here, I need to explain a little bit about Internet Explorer's architecture. Internet Explorer, when it requests a URL, basically creates what is known as a URL moniker to retrieve the data. The processing of this moniker is carried out inside of Urlmon.dll. Urlmon will delegate these requests for URLs based on the protocol, such as HTTP, FTP, or Gopher. It will delegate these requests to a pluggable protocol handler registered for that protocol. Most of the time, this protocol handler is the Internet Explorer default HTTP protocol handler. When the protocol handler reports a MIME type, this is when Urlmon will carry out the data sniffing. The data sniffing is exposed via FindMIMEFromData, and it's called by Urlmon itself. This whole process for the Internet Explorer default handler kicks off when WinInet, which is the Dynamic-Link Library (DLL) responsible for the bit shuffling between client and server, has parsed out and returned the Content-Type header to the protocol handler, and when a buffer size of 256 bytes of data is available (or the whole file, if the file is less than 256 bytes). As we move on here to the detection steps, we need to first look at the way Internet Explorer will classify MIME types internally. The first classification is ambiguous, and the other is known versus unknown. We will move on, now, to how Internet Explorer classifies ambiguous MIME types. Ambiguous MIME types (slide 10) are somewhat of a historical construct, in that they were much more important in the 1995 to 1997 time frame, sort of during the dawn of server and client software for the Web. Ambiguous MIME types are set up because servers don't always return the correct Content-Type header. You could have, as I said, old Web server software that just simply lacked the depth to perform the necessary operations, or it much more likely that it is due to an incorrectly configured server. For example, if we were navigating on the client to Word documents, the server might be incorrectly configured to send a Content-Type of application/octet-stream or even text/plain, or no Content-Type at all. So, Internet Explorer designates some types internally as ambiguous. The two major types that fall into this category are text/plain and application/octet-stream. Null and empty MIME type strings are also treated as ambiguous because they really don't provide any information about the data. Moving on (slide 11), the next classification we can make about a MIME type is whether it is known or unknown. Internet Explorer, internally, has a hard-coded list of MIME types that it knows. It knows a little bit more about them than simply their names. It knows whether or not they are primarily text or binary. It also knows, in many cases, what kind of signature file that it could have. For example, data of an HTML format always contains an <HTML> tag. That might be an example of a hard-coded signature that Internet Explorer would use to identify data when it doesn't know the MIME type. On this slide, there are a couple of things that I would like to point out. There are two types in italics. Those two types are the ambiguous types, application/octet-stream and text/plain. Simply because a type is ambiguous does not mean that Internet Explorer treats it as unknown. In fact, it does know text/plain and application/octet-stream. It also knows that they are ambiguous. Also, on this list, there are two file types that are asterisked (*), audio/mid and text/xml. Internet Explorer doesn't treat these two types as known, although these two file-type signatures do appear on the slide and Internet Explorer does know them. When we go through the steps for MIME type detection, you will see where the known versus unknown comes in to play, as well as the hard-coded signature in the data sniffing. Moving on to the next slide (slide 12), we will begin our discussion of the MIME type detection in Internet Explorer. There are six major steps that Internet Explorer goes through. We will go through a couple of examples to remove a little bit of the abstraction and show you how this process actually works. And we will do that after we have discussed the final step. Starting off, step 1, MIME type detection. The first thing Internet Explorer is going to do is to see if the type suggested by the server, via the Content-Type header, is either known or ambiguous. If the Content-Type is not known and is not ambiguous, Internet Explorer will simply verify that suggested MIME type. It will return from the FindMIMEFromData with the MIME type that was suggested by the server. Why does this occur? Well, new MIME types come along all the time, and they may be structurally similar to known types. You could invent a particular subtype that contained financial data or something in a plain-text format, and data sniffing could not positively ID that as your subtype. Also, here we do the check for multipart as a top-level type. If multipart is returned, we will throw that out and use data sniffing to try to determine the actual MIME type of the primary piece. Moving on to step 2 (slide 13). Step 2 of the detection process is the step in which data sniffing and data sampling are done. The first part of step 2 is cleaning up the buffer of data that we have received back from WinInet. We begin with an optimization. Internet Explorer is going to check to see if it has received data of a common type first, browsing through HTML, Graphics Interchange Format (GIF), and Joint Photographic Expert Group (JPEG) files almost exclusively. It makes sense to check for these types first to save the work of the later actions. If it does not find one of these common types, it will begin scanning the buffer to look for one of the more uncommon but still known types. As it goes through the buffer, it uses hard-coded signature tests to try to match up the data with a known file format. Examples of these are, for HTML, using HTML tags. Anything containing the angle brackets is subject to inspection. For other files, like GIF files, it will look for a string to begin the file; GIF files, for example, begin with GIF 89 or GIF 87, depending on the specification used (GIF files being all text format). For binary files, almost every binary file format has a series of "magic bytes" in common. Internet Explorer will know and look for those bytes to try to determine the type based on the data. As it is doing this buffer scanning, it will also make a note of whether the data is mostly text or mostly binary. In this step, if it finds a type matching the format of the data, it will return that type here. Moving on to step 3 (slide 14). Now, we have looked through the data returned from the server, and we have picked up any known types at this point. So, we are starting to grasp for straws, and we are going to determine if the data is of a text or binary predominance. We are going to look and see if this agrees with the suggested top-level type. For example, application, as a top-level type, is primarily binary, whereas a text top-level type is text. If we find an agreement between the text binary predominance and the suggested type, we will return the suggested type as verified. Why do we do this? Over time, new formats of existing known types may be added. An example of this would be image/tiff. Different Tagged Image File Format (TIFF) image-compression schemes have come out over time. Although they are the same MIME type, image/tiff, these files actually look structurally different. Our existing data sniffing hard-coded signature test may fail to positively identify these types. However, the basic format, the basic top-level type — image, in this case — really shouldn't change. That format information is stored in Urlmon. So, this just gives Internet Explorer a little bit of flexibility when having to deal with new types as they arrive. Moving on to step 4 (slide 15). We have now reached a point where we have looked for everything that is known, we have passed back anything unknown, and we have found that text/binary predominance does not match with the suggested type. Now, we are left with the file name extension from which to glean some more information about this data that the server is sending back. We are going to look in the registry for the MIME type associated with the file extension. The particular key that we are looking at is HKEY_CLASSES_ROOT\ViewerExt (where ViewerExt is the extension of the file), and underneath that there are two values that we will be concerned with. The first is a ContentType value, if present. That value will specify MIME type. The other is the null value, the default value, which is the ProgID associated with that extension. Looking at the Content-Type header, if it is not found, we will simply continue on with the detection process. If we find it and it's unknown, we will return this type because we have an unknown MIME type for which the server has not returned the correct data type. If it's found and known, the file type is rejected. This may seem a little strange, but when you consider that we have hard-coded tests for every known MIME type, the file is considered suspect; if it hasn't been picked up by data sniffing tests, the file is of a possibly corrupt format. So, we are going to continue on with the detection process. Also, if the content I have found is ambiguous, such as text/plain, no new information has been learned, so we will need to continue on, as well. In the last two cases, we will continue on to step 5. Let us look at that step now. In this step (slide 16), we will look in the registry for the application associated with the ProgID value found previously from the file extension. We will look this up under HKEY_CLASSES_ROOT\ProgID\Shell\Open\Command. This is where the shell would store the command line to be used when the user double-clicks on a file or chooses the open verb. If we find a command line here, we are going to return the type application/octet-stream. This is done to ensure that data that Internet Explorer could display inline is instead passed to the appropriate viewer. An example of this would be a batch file (a .bat file). These files are just basically plain text, but they are not to be opened inside of Internet Explorer. Internet Explorer is not supposed to show you the content, the code inside of a batch file. What it should be doing is passing on that data to the command interpreter. This step of MIME detection ensures that occurs. Moving on to step 6 (slide 17), the final step. At this point, all attempts to determine what we have received have failed. We are going to default to text/plain if the data buffer is mostly text, or to application/octet-stream if the data buffer is mostly binary. To give you a little bit better idea of how this detection process works, I would like to go through a couple of examples. First, we will take the case of navigating through a ZIP Archive Format (ZIP) file where the server is set up correctly to return the appropriate MIME type. In this case, we will go through step 1 and we will go on to the data sniffing of step 2. Assuming our ZIP file is of the correct format — ZIP files are set up to start with the letters pk, and that is their signature—assuming we have it set up correctly, we will fall through step 1 as a known MIME type, or not ambiguous, and we will be caught in the data sniffing part where the hard-coded signature tests for ZIP files is picked up, and the matching type will be returned there. Our second example is much more interesting, a little more complex, and, thankfully, a little more rare. We'll make a file type, navigating to file type XYZ. We will give it an .xyz extension. We'll make up a Content-Type header of application/xyz. We will also say that our server is set up incorrectly and it is going to send back a Content-Type header of text/plain, despite the fact that we are an application subtype, so our data is all binary. We will fall through step 1 because we have an ambiguous type. We won't be picked up by the data sniffing of step 2. So, continuing on to step 3, the server has returned text/plain, but when Internet Explorer goes to look through the data buffer, it is going to find that we are a mostly binary file. So, we are going to fall through step 3 because our data does not agree with the suggested type. For this example, we will postulate that the .xyz extension is not registered on the client machine. So, we will be falling through steps 4 and 5 because there is no extension on the machine from which to glean any more information. So, we are going to fall all the way to step 6, and because we are mostly binary, it will return application/octet-stream. Then, Internet Explorer will start trying to find what is supposed to view this. In most cases, you will find that we never get to step 6, and we rarely get to steps 4 and 5. Most file types are picked up via the data sniffing. Now that we have covered detection, we need to move on to MIME type handling. Internet Explorer, at this point, has determined the MIME type of the data that is being sent from the server, and now it needs to locate a viewer to present this data to the user. This functionality is also inside of Urlmon.dll. Programmatically, it's exposed using the GetClassFileOrMIME. This can also encapsulate a call to find MIME from data if a buffer is passed in. Obviously, Internet Explorer, at this point, will not be passing in a buffer because it has already done the FindMIMEFromData call itself. Moving along to the next slide (slide 19), we'll look at an overview of some of the MIME-viewer types that Internet Explorer will support. The first and best-case scenario would be Microsoft ActiveX® document servers. Internet Explorer has its own favorite document server, which is MSHTML, the Microsoft HTML parsing, rendering, and display engine. That is what you see any time you view a Web page. MSHTML is loaded inside of Internet Explorer. Internet Explorer, at its core, is simply an ActiveX document-server host. So, it gives preferential treatment to ActiveX document servers because that is what it was designed to host. Other examples of ActiveX document servers include the Microsoft Office applications, such as Word, PowerPoint®, and Excel. all of these programs can be hosted in an ActiveX document-server context. Next up, we have ActiveX controls. These include Adobe Acrobat Reader. Originally, that was implemented as a plug-in, but newer versions have it implemented as an ActiveX control. Other examples include the Windows Media Player, and many others. Finally, we have the misfit toys, Netscape plug-ins and the helper applications. These are the original MIME viewers, going back all the way to the days of Netscape 2.0 and even the original Mosaic. Helper applications are out-of-process .EXE files, which are basically used to view data after Internet Explorer has already downloaded it. These have been around forever, and really require no special authoring. Internet Explorer will look for a helper application after it has failed to find any of the other MIME viewer types. Netscape plug-ins on a Windows platform are Win32® DLL files, which, in the context of Internet Explorer, are hosted inside of a very specialized ActiveX control, allowing Internet Explorer to treat plug-ins and ActiveX controls in somewhat the same manner. Moving on to the next slide (slide 20). In this process, for ActiveX viewers, the MIME type associations are stored in the registry. ActiveX controls are COM components. So, the registry is used as the standard repository for all of the relevant information. The main MIME information stored is under HKEY_CLASSES_ROOT\MIME\Database\Content Type\MIMEType. There are a lot of other keys that can come into play here, and there are specific values that can be used underneath this key. The full details of this are presented in a KB article, Q165072, and it's called "SAMPLE: MIMEType.exe Makes ActiveX Object Default MIME Type Player." It goes through an example of how to register an ActiveX control as a MIME viewer for use in Internet Explorer. Netscape plug-ins are a different case in that they are registered on-the-fly as Internet Explorer starts up. When Internet Explorer starts up, it looks in some predetermined directories; it looks for the Internet Explorer Plugins directory, the Netscape Plugins directory, and then it looks for DLL files that contain the appropriate resource strings that Netscape plug-ins do. I won't go into any depth on that, but you can consult the Netscape plug-in API call for more examples of how those DLL files are formatted. The MIME type and extensions handled by plug-ins are sorted by resource strings in them. Internet Explorer will parse these out during startup and build a plug-in table on-the-fly during startup. It builds this table under HKEY_LOCAL_MACHINE\Software\Microsoft\Internet Explorer\Plugins. Only plug-ins are registered on-the-fly. ActiveX controls are registered at the time the component is registered. Moving on to the next slide (slide 21), there is a special case for MIME viewers. You will often see ActiveX control MIME viewers hosted on a page. For example, if I embedded the Windows Media Player control on a page and used it to play a specified file. That is one case of using a MIME type viewer. The other would be when you navigate directly to a file from a hyperlink, by typing it into the address bar in Internet Explorer, or by clicking Start, selecting Run, and typing a URL. These direct navigations are done full-screen, assuming that the ActiveX control can support it. ActiveX documents are always full-screen; they take the entire Internet Explorer client area. Helper applications are also exempt from this process; they are out-of-process and they don't ever display within the confines of Internet Explorer. So, they also don't participate in this. In the case of ActiveX controls that are registered using the EnableFullPage registry key, Internet Explorer will create a dummy HTML document that hosts the plug-in full-size on the screen. This is necessary because Internet Explorer is only capable of hosting ActiveX documents servers. It can't host ActiveX controls directly, so it needs a middleman to provide that translation. Internet Explorer will use its favorite document server, MSHTML, to be that middleman. It is basically going to create a very, very simple HTML document which simply has an <EMBED> tag with a height and width of 100 percent, and the MIME type, and a source to point it to the necessary data. Now that we have a little bit more background on how Internet Explorer does its MIME type detection and specifically handling, I would like to revisit the Content-Disposition header so we can look at some of the options it affords. This header can be used to bring up an Open/Save dialog or force an inline display, if that is at all possible. That is done using the attachment or inline-specified behavior in the Content-Disposition header. Also, the file name specified by Content-Disposition is the file name that will be used when the file is cached by WinInet, by Internet Explorer. That cached file name is subsequently used in the MIME type detection process. So, when we say that we are going to parse out the file extension, we're parsing the file extension out of the cached file name. Also, you can use the file name without specifying the disposition. You can leave a blank or null disposition and simply do Content-Disposition: ;filename=<filename>. One caveat here is that this is historically problematic in Internet Explorer. In general terms, this wasn't fully implemented in Internet Explorer 4.0. It worked pretty well in Internet Explorer 5.0, and, in Internet Explorer 5.5, there are a few slight problems using this with known file types or known MIME types. In any case, it is a good tool to use when the situation arises. Next up (slide 23), a little piece of information that is not covered anywhere specifically, and I thought I would toss it in, because when you are in the situation of working alongside the Internet Explorer MIME type detection and MIME type handling code, if you are using a Netscape plug-in, you will notice from network traces that Internet Explorer does three requests as opposed to the normal one request for the data. I would just like to talk a little bit about each of these three requests and give a little information about them, just so you know what each of these pieces of the puzzle do. Because three requests, obviously, for a single file could cause some heavy server traffic, you can at least rid yourself of a third of that burden by understanding a little more about how the Internet Explorer architecture uses these as a result of these three requests. The first request is the original request for the navigation to the plug-in data file. This data, when it is retrieved from the server, is forced through Internet Explorer's data sniffing code as is a normal request. Then, the type is determined and a handler is located, in this case, a plug-in. Internet Explorer will then go to the trouble of creating a dummy document in instantiating this full-screen object. At this point, when it's instantiating this object, it is going to make another request to the server. This request's purpose is to simply retrieve the Content-Type a second time. This seems redundant, and it is redundant, but it is done this way for historical reasons. Again, in the past , Internet Explorer tried to exactly emulate the behavior of Netscape in this respect, down to the request level, and although Netscape subsequently had its architecture changed, Internet Explorer did not. So, this second request persists to this day. You will be able to pick up on this second request because the user-agent that goes out is contype. This request is made as a blocking request, so it has a 10-second time-out; If the server does not respond within 10 seconds, Internet Explorer will shut down the process of instantiating the plug-in. So, you will run into trouble with slow servers, or in cases where you are creating large, dynamic documents. It is important to note here, with the second request, that Internet Explorer does not look at the message body for this request. It doesn't look at it at all. It only looks at the Content-Type pattern. You can use that to streamline your server-side code, when you are dynamically generating these things, and save yourself a generation round-trip at this point, in response to the user-agent of contype. The third request goes out when the full-screen object is instantiated. Internet Explorer is going to go out and retrieve the data itself and pass it along to the instantiated plug-in. That pretty much wraps it up. I have some links to go over with you and then we will have questions. Moving on to my first slide of links (slide 24), I just have some URLs up here for some of the RFCs that I talked about. The Content-Disposition RFC (http://www.ietf.org/rfc/rfc2183.txt), the MIME specification RFCs (http://www.ietf.org/rfc/rfc2045.txt), and RFC 2936 (http://www.ietf.org/rfc/rfc2936.txt), which you may find interesting — it discusses a little bit more about MIME type handling in a generic HTTP user-agent. All these RFCs are available off of the http://www.ietf.org site, the Internet Engineering Task Force. Moving on to my final links slide (slide 25), the HTTP 1.1 specification is very good reading if you are involved in development of this sort. You can get to it off the http://www.w3.org/protocols site. It is itself another RFC, 2616, available at http://www.ietf.org/rfc/rfc2616.txt. For more information, you can consult your local library, or the MSDN Online Web Workshop (http://msdn.microsoft.com/workshop/networking/), which is really the one-stop resource for Internet Explorer behavior documentation. In particular, what we have been talking about today would be covered under the Networking subdivision of the Web workshop. The Microsoft Knowledge Base also has all the KB articles that I have mentioned here today, as well as a cornucopia of other KBs. If you just go to http://support.microsoft.com/, and do a search on all Microsoft products, or Internet Explorer programming in particular, or Content-Type or Content-Disposition, you will find a good afternoon's worth of reading there. I am now going to hand it back to Heidi to start the Q&A. Heidi: Excellent. Thanks so much, Dan. Great presentation. It is time now to move on to the Q&A portion of this Support WebCast. I want to remind all of you, again, that this portion of Support WebCast is intended to encourage further discussion of the Support WebCast topic; however, one-on-one product support issues are outside the scope of what we are able to address during this session. If you do need to get technical assistance, I would encourage you to either phone into Product Support Services and speak with a support professional, or submit an incident on the Web. With all that said, let's jump into the Q&A. The first question is: What does MIME stand for? Dan: Okay, MIME stands for Multipurpose Internet Mail Extensions. MIME was originally conceived as a standard for rich e-mail content. It worked so well that they decided to use it when they were working on the HTTP transaction standards. Heidi: Okay, the next one is a little bit longer, and it sounds like it is hedging into support, but we are going to ask it anyway, in case you have some general ideas on this. I've had trouble with Internet Explorer 5.5 handling dynamically-generated Portable Document Format (PDF) data with Content-Type of application/pdf. All other browsers that host Acrobat Reader treat this fine; however, Internet Explorer 5.5 with Acrobat Reader 4.05 never loads the plug-in and shows a blank page. Clicking on the links directly to PDF files works fine. Dan: Okay. This is a very, very common scenario. I am assuming that you are dynamically generating your PDF data using, say, ASP or an ISAPI DLL. I will talk about those two cases, just because I am most familiar with those. This is a prime example of when you would want to use the Content-Disposition header. If, for example, you are streaming back binary data from an ASP file, and you are streaming this back as PDF data, your extension on your URL, when parsed out, will be .asp, or DLL in the case of an ISAPI DLL. When Internet Explorer goes through its detection motions, it will find that your extension does not agree with the data you are sending back. This is because it has cached the file with the .asp or .dll extension. You can prevent the caching with a file name by setting up a Content-Disposition header on your server and passing back a made-up file name, just by using Whatever.pdf as a file name. Make sure that your Content-Type header is being set correctly to application/pdf. You can set your disposition behavior to inline or just leave it blank. One other trick, and I am glad you brought this up, is to include a query string on your URL. It seems a little hacky, but this is just playing in with the heuristics of how Internet Explorer does a lot of its work. One thing you can do is attach a query string. And, if you are familiar with the way query strings work, it will be a question mark followed by a name-value pair. And, in this case, we would do something like ?ext=.pdf. There are a couple points inside of Internet Explorer where the file extension is parsed out in a somewhat less-than-perfect manner. Adding that query string and making sure that that is the final name-value pair, if you already have a query string, should put you in a little better position because it will parse out a PDF when it goes to pull out that query string. Those are the suggestions I have for you on that. Heidi: Okay. Those sound like some great suggestions. Next question is: Is the request #2 an HTTP GET or HEAD? Dan: It is actually a GET. It should be a HEAD, but I believe at the point that the code was written it was not a guarantee that the server would respond correctly to a HEAD request. I don't think that code has been revisited since then. So, it is a GET request, and with a NetMon trace, you should see that. One thing about that contype request. You will find that the contype request does not correctly pass on session information from ASP, in that it doesn't correctly forward cookie information. It also fails to forward authentication information. If you are familiar with HTTP, you know the authentication challenge/response information is all passed back and forth via headers. So, if you lock down a directory that has plug-in data in it, you may run into problems. What you will see in a network trace is that the contype request goes out to the client, the server responds with a challenge, and the client just basically stops because it doesn't have that information. The contype request is a very separate beast, and it has posed no small headache for Web developers. Heidi: Okay. I want to encourage all of you to send us some feedback. You can use the e-mail alias feedback@microsoft.com, and be sure to include the Support WebCast topic in the subject line. If you have comments about the Support WebCast, any suggestions for future topics you would like to see, and any comments about this session, specifically, we do encourage you to send us your feedback. We do appreciate that, and thank you very much. The next question is with regard to pluggable protocols. In addition to asking if we have any extensive documentation on the subject, there are a couple of particular questions. One of which is: What and how can I get from IInternetBindInfo interface? And what message should I send to Internet Explorer so it would handle everything clearly? Dan: Well, that is a little bit outside the scope of the presentation, but I can give you a little information about pluggable protocol handlers inside of Internet Explorer. The MSDN Online Web Workshop, in the Networking section (http://msdn.microsoft.com/workshop/networking), has the documentation for pluggable protocol handlers in Internet Explorer. From the IInternetBindInfo interface, you can get a lot of the header information. You can also query around to get more information in general about the bind process as it is going on. I assume that we are going to stay in the realm of MIME type handling for that final question. There are a couple of notifications you can send back to Internet Explorer regarding MIME types. They are large, large constants. They start with BINDSTATUS, one would be BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE. That is where your pluggable protocol handler would suggest Internet Explorer MIME type. That MIME type, again, is subject to data sniffing and verification via Internet Explorer. If you want Internet Explorer to skip the verification process because your protocol handler knows what it's doing, you can send back the BINDSTATUS notice that a verified MIME type is available. If you send that back, Internet Explorer should not question that MIME type, it should just take it implicitly. Heidi: Okay. Moving on to the next question: On some client machines, after installing Adobe Acrobat Reader, the browsers on the client machine fail to view the PDF file correctly, and Internet Explorer then displays the error message that an appropriate viewer could not be located. I have heard others have this problem, and it is mentioned on newsgroups. Usually this is fixed when I force the Nppdf32.dll file into the Plugin directory; however, not all the time. Do you have any idea if this is an Internet Explorer issue or an Adobe issue? Dan: I don't know if there is an easy single culprit in this case. By forcing the Nppdf32.dll file into the Plugin directory, you would be using the Acrobat plug-in, not the Acrobat ActiveX control. The good people at Adobe could probably go into a little more depth with you on how to configure Acrobat. It is a little out of my area of expertise. But, for the most part, you want to be sure that you are using the current version of Acrobat. I believe it's version 4.05, available from Adobe. Be sure you have it set up correctly in Internet Explorer — in Adobe there's an Enable Web browser integration check box; you want to make sure that is checked. That one is tough. You want to make sure you have the most current versions. Did they give an Internet Explorer version? Heidi: They did not give an Internet Explorer version, but have you seen this issue before? Dan: I have seen this issue before. Again, in the case of viewing dynamically generated PDF files, it is subject to the information I passed on earlier about the Content-Disposition header and the query string. Otherwise, it sounds like the error "Could not locate appropriate viewer for application/pdf; would you like to go the ActiveX gallery?" When that error message comes up, there will be a little shape in the upper left-hand corner that is yellow, red, and blue, on an otherwise blank screen. That is the screen that Internet Explorer presents when it thinks it is going to be showing a plug-in, but it could not locate the appropriate plug-in. If you have Acrobat installed correctly, you shouldn't run into this situation. I have also seen this in one other scenario, which is using long query strings appended to PDF files. I believe the Internet Explorer version was 5.01. It was in the neighborhood of 45 to 50 characters in the query string that you'd run into some problems, and it would just get confused and do that. The way around that would be to use a POST request instead of a GET. That may not directly address your question, but I think that is all the data I've got on that. Basically, verify that you have the newest version of Acrobat and that you fully reinstall it because those file extensions can get mangled by other people. Heidi: The bottom line sounds like it is not something that is a bug that is happening to everybody all the time. Dan: Yes, it is not a black-and-white issue. A lot of these things are not black-and-white issues. While the MIME detection pieces of Internet Explorer have stayed pretty constant throughout the years, the MIME type handling pieces have changed from version to version, and Internet Explorer 5.5 is certainly no exception. Heidi: Okay. Moving on to the next question: With regard to Content-Disposition, should I use content disposition to force a Save As dialog with a file name different from the original URL request? If so, can you tell me how to do this? Dan: You can, but normally you wouldn't. The main uses of Content-Disposition are: suppose I was an anonymous online e-mail provider and I wanted to have some way for my users to get attachments, suppose I displayed to them their e-mail messages in an HTML Web page context but I wanted their attachments to be there as well, and suppose I wanted to set it up so that their attachments were simply links that they could click on and be prompted with a Open/Save dialog, much as you would see in Microsoft Outlook®. I would do this by setting up my server in the directory where I store all of the attachments to automatically send back the Content-Disposition header of attachment. You could specify a different file name, just by simply using the file-name parameter in the Content-Disposition header. But, for the most part, you probably wouldn't unless you were dynamically generating them with a very strange timestamp name or something. One other thing: when you get into manipulating or always forcing prompts, there are a couple of settings in the shell that will affect you. Specifically, the location of them in Windows 2000 is in Windows Explorer: select Tools, Folder Options, File Types. Then, you want to get to the Registered file types. In Windows 2000, there will be an Advanced button. You pick the MIME type. There are a couple settings which play into this quite heavily. They are Confirm open after download and Browse in same window. I will take the example of a Word document. If you have a Word document set up with Browse in same window checked, Internet Explorer will display it inline. If you have it cleared, Internet Explorer will always open up Word, or open up the owning application. Similarly, with Confirm open after download, if you have that box checked, Internet Explorer will always prompt you with an Open/Save dialog box. If you specify that you want a Content-Disposition of attachment, you will get two of them. The other thing is that, in certain cases, there can be a conflict between returning a behavior via the Content-Disposition header and the setting used in Browse in same window. When I was testing this, the Content-Disposition attachment will override Browse in same window, but the inverse is not true; Content-Disposition inline will not override an cleared Browse in same window box in the file-type association. Heidi: Okay, the next question is: With regard to WinInet, is there a way to retrieve header information from a response read by InternetOpenURL, InternetReadFile, etc.? Dan: That one is out of my area of expertise. The WinInet API call is pretty wholly separate. Check out the MSDN Library or the MSDN Online Web Workshop. I am pretty sure the API call you are looking for is called HTTPQUERYINFO. It basically sounds like you want to parse out the headers. So, I'm pretty sure that's the API call you are going to want for that. Heidi: Okay. The next question is a follow-up to a question that you already answered, and it simply is: Please repeat the query string you suggested in passing for the URL when generating dynamic PDF data. I think it was ?ext=.pdf, but I'm not sure. Dan: Exactly. If I have an ASP page, and I just call it Whatever.asp, I would want to change it so that my URL read Whatever.asp?ext=.pdf. The "ext" can be anything you want. It can be "This is cool," it can be "This is awesome," whatever. It is inconsequential. But, what really we want to do is make sure that PDF ends our full URL string. When you have an ASP page, obviously, you can't refer to Whatever.asp by calling it Whatever.pdf. So, the query string is the means by which you do that. When the transcript comes out, it should be clear. Heidi: Okay. We are down to our last question, now. I want to, once again, encourage you to submit feedback to us. We are very interested in your feedback regarding this support WebCast. If you have any comments about the program overall or any suggestions for future WebCasts, you can submit your feedback using the alias feedback@microsoft.com. Please be sure to include "Support WebCast" in the subject line. The last question we have in the queue right now is: We put some PDF files in the directory on our intranet server, but when we lock it down with NTLM authentication, we see some network traces that the contype request is denied active. Is this a known problem? Dan: Yes, it is. As I was saying, the contype request does not correctly pass on that authentication information or the cookie in session information. The authentication information is cached by WinInet internally so that you don't have to authenticate every single time you navigate after you are authenticated. But, the contype request comes from outside the existing WinInet session. It does not work, unfortunately. Heidi: Okay, with that question answered, we have cleared the queue of all the questions that were submitted during today's broadcast. I want to thank all of you for joining us today. I hope the content was useful. We do hope that you have an opportunity to join us again in the near future. Have a great day, and good bye. |
|
|