Reading HTTP 1.1 requests from a real web server in C#
I've received rather a lot of questions recently asking the same question, so I thought that I 'd write a blog post on it. Here's the question:
Why does my network client fail to connect when it is using HTTP/1.1?
I encountered this same problem, and after half an hour of debugging I found the problem: It wasn't failing to connect at all, rather it was failing to read the response from the server. Consider the following program:
using System;
using System.IO;
using System.Net.Sockets;
class Program
{
static void Main(string[] args)
{
TcpClient client = new TcpClient("host.name", 80);
client.SendTimeout = 3000;
client.ReceiveTimeout = 3000;
StreamWriter writer = new StreamWriter(client.GetStream());
StreamReader reader = new StreamReader(client.GetStream());
writer.WriteLine("GET /path HTTP/1.1");
writer.WriteLine("Host: server.name");
writer.WriteLine();
writer.Flush();
string response = reader.ReadToEnd();
Console.WriteLine("Got Response: '{0}'", response);
}
}
If you change the hostname and request path, and then compile and run it, you (might) get the following error:
An unhandled exception of type 'System.IO.IOException' occurred in System.dll
Additional information: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time or established connection failed because connected host has failed to respond.
Strange. I'm sure that we sent the request. Let's try reading the response line by line:
string response = string.Empty;
do
{
string nextLine = reader.ReadLine();
response += nextLine;
Console.WriteLine("> {0}", nextLine);
} while (reader.Peek() != -1);
Here's some example output from my server:
> HTTP/1.1 200 OK
> Server: nginx/1.9.10
> Date: Tue, 09 Feb 2016 15:48:31 GMT
> Content-Type: text/html
> Transfer-Encoding: chunked
> Connection: keep-alive
> Vary: Accept-Encoding
> strict-transport-security: max-age=31536000;
>
> 2ef
> <html>
> <head><title>Index of /libraries/</title></head>
> <body bgcolor="white">
> <h1>Index of /libraries/</h1><hr><pre><a href="../">../</a>
> <a href="prism-grammars/">prism-grammars/</a>
09-Feb-2016 13:56 -
> <a href="blazy.js">blazy.js</a> 09-F
eb-2016 13:38 9750
> <a href="prism.css">prism.css</a> 09-
Feb-2016 13:58 11937
> <a href="prism.js">prism.js</a> 09-F
eb-2016 13:58 35218
> <a href="smoothscroll.js">smoothscroll.js</a>
20-Apr-2015 17:01 3240
> </pre><hr></body>
> </html>
>
> 0
>
...but we still get the same error. Why? The reason is that the web server is keeping the connection open, just in case we want to send another request. While this would usually be helpful (say in the case of a web browser - it will probably want to download some images or something after receiving the initial response), it's rather a nuisance for us, since we don't want to send another request and it's rather awkward to detect the end of the response without detecting the end of the stream (that's what the while (reader.Peek() != -1);
is for in the example above).
Thankfully, there are a few solutions to this. Firstly, the web server will sometimes (but not always - take the example response above for starters) send a content-length
header. This header will tell you how many bytes follow after the double newline (\r\n\r\n
) that separate the response headers from the response body. We could use this to detect the end of the message. This is the reccommended way , according to RFC2616.
Another way to cheat here is to send the connection: close
header. This instructs the web server to close the connection after sending the message (Note that this will break some of the tests in the ACW, so don't use this method!). Then we can use reader.ReadToEnd()
as normal.
A further cheat would be to detect the expected end of the message that we are looking for. For HTML this will practically always be </html>
. We can close the connection after we receive this line (although this doesn't work when you're not receiving HTML). This is seriously not a good idea. The HTML could be malformed, and not contain </html>
.