WebRequest and related caching issues in .NET

Getting around excessive local caching when using WebRequests to retrieve URLs in C#

On a recent project I ran into a problematic caching issue when using the System.Net.HttpWebRequest class to retrieve a simple JSON from a URL.

The initial GET successfully retrieved the most recent version from the webserver, however subsequent requests returned an aggressively cached version of the first GET request. This persisted even through the server version had changed (confirmed by loading the same URL in a webbrowser). If I however either restarted the application or waited for about 5 minutes a fresh copy was retrieved from the server.

Apparently this issue is the result of a mismatch between the internal settings in the WebRequest related classes and the server-side configuration that cause the server-side cache to constantly return an out-of-date response.

What doesn't work

After searching for a solution and discovering that this is a persistent problem with the WebRequest classes. I tried all possible variations of the "solutions" that I was able to find online.

These are the proposed internet solutions I tested and absolutely did not work:

  1. Change the server code. As the server in question is not under my control so this was not an option. link, link.
  2. Modify the Client Cache policies (HttpRequestCachePolicy) to bypass any local caches. link, link.
  3. Bypass local cache by adding a fake querystring value to the URL for every request. (e.g. ?r=some random value). link, link, link.
  4. Manually deleting the WinInetCache entry that the CLR proposedly uses through P/Invoking WinInet.dll. link.
  5. Wait for the ~5 min required between requests so that the cache is bypassed. Nope, this is not an acceptable solution.
  6. Restarting the application every time. Yeah this is not an acceptable solution.

Solution

I finally found this post on StackOverflow that discusses a solution by writing your own Socket code. This set me on the right path and I successfully solved the problem by handling the socket connection manually.

This is regretfully not the ideal way to do this as implementing a HTTP/S socket connection is non-trivial and has a lot of edge/special cases that are difficult to get right. Although this renders my code below not suitable for a generic production use you can make changes to it so that all of your cases are correctly handled.

Below is the main retrieval function. It will connect the socket and perform a simple GET operation and return the raw results. Note that the results will contain all of the header data at the top and then the HTML response, so some parsing is needed.

private string GetDataUsingSocket(string url, CookieContainer cookieContainer = null )
{
  // Optionally add a random element to the url if the correct macro is present

  url = url.Replace("%RAND%", DateTime.Now.Ticks.ToString());

  // Structured uri to be able to extract components

  var uri = new Uri(url);

  // Create the socket and connect it

  var clientSocket = ConnectSocket(uri);
  if (clientSocket == null || !clientSocket.Connected)
    return null;

  try
  {
    return PerformGet(uri, clientSocket, cookieContainer)?.ToString();
  }
  catch (Exception)
  {
    return null;
  }
  finally
  {
    clientSocket.Close();
    clientSocket.Dispose();
  }
}

The function is called as so:

// Normal url

GetDataUsingSocket("http://www.example.org");

// Using a random value to bypass local cache

GetDataUsingSocket("http://www.example.org?rnd=%RAND%");

Here is how the socket is connected

private Socket ConnectSocket(Uri uri)
{
  // Detect the correct port based on the protocol scheme

  int port = uri.Scheme
                .StartsWith("https", 
                            StringComparison.OrdinalIgnoreCase) ? 443 : 80;

  IPHostEntry hostEntry;
  Socket clientSocket = null;

  // Resolve the server name

  try
  {
    hostEntry = Dns.GetHostEntry(uri.Host);
  }
  catch (Exception)
  {
    return null;
  }

  // Attempt to connect on each address returned from DNS. Break out once successfully connected

  foreach (IPAddress address in hostEntry.AddressList)
  {
    try
    {
      clientSocket = new Socket(address.AddressFamily, 
                                SocketType.Stream, 
                                ProtocolType.Tcp);
      var remoteEndPoint = new IPEndPoint(address, port);
      clientSocket.Connect(remoteEndPoint);
      break;
    }
    catch (SocketException ex)
    {
      return null;
    }
  }

  return clientSocket;
}

How the GET request is constructed and any necessary cookies are propogated

private StringBuilder PerformGet(Uri uri, Socket clientSocket, CookieContainer cookieContainer)
{
  // Create the cookie string if there is one

  string cookies = "";
  if (cookieContainer != null && cookieContainer.Count > 0)
  {
    cookies = "\r\nCookie: ";
    foreach (Cookie cookie in GetAllCookies(cookieContainer))
    {
      cookies += cookie.Name + "=" + cookie.Value + "; ";
    }
  }

  // Format the HTTP GET request string

  string getRequest = "GET " + uri.PathAndQuery + 
            " HTTP/1.1\r\nHost: " + uri.Host + 
            "\r\nConnection: Close" + cookies + 
            "\r\n\r\n";

  var getBuffer = Encoding.ASCII.GetBytes(getRequest);

  // Send the GET request to the connected server

  clientSocket.Send(getBuffer);

  // Create a buffer that is used to read the response

  byte[] rData = new byte[1024];

  // Read the response and save the ASCII data in a string

  int bytesRead = clientSocket.Receive(rData);

  var response = new StringBuilder();
  while (bytesRead != 0)
  {
    response.Append(Encoding.ASCII.GetChars(rData), 0, bytesRead);
    bytesRead = clientSocket.Receive(rData);
  }

  return response;
}

Obtaining the cookies from the CookieContainer requires a little bit of hacking, discussion.

private static CookieCollection GetAllCookies(CookieContainer cookieJar, string scheme = "https")
{
  var cookieCollection = new CookieCollection();

  Hashtable table = (Hashtable)cookieJar.GetType()
                      .InvokeMember("m_domainTable",
                            BindingFlags.NonPublic | 
                            BindingFlags.GetField  | 
                            BindingFlags.Instance,
                            null, cookieJar, new object[]{});
  foreach (string rawKey in table.Keys)
  {
    // Skip dots in the beginning, the key value is the domain name for the cookies

    var key = rawKey.TrimStart( '.' );

    // Invoke the private function to get the list of cookies

    var list = (SortedList)table[rawKey].GetType()
                      .InvokeMember("m_list",
                              BindingFlags.NonPublic | 
                              BindingFlags.GetField  | 
                              BindingFlags.Instance,
                              null, table[key], new object[]{});

    foreach (var uri in list.Keys.Cast<string>()
                   .Select( listkey => new Uri(scheme + "://" + key + listkey)))
    {
      cookieCollection.Add(cookieJar.GetCookies(uri));
    }
  }

  return cookieCollection;
}

Initializing the cookie container

Sometimes it is necessary to initialize session cookies and other supporting cookie information, e.g. the user might have to log in first. To do this the function below can be used to pre-populate the CookieContainer that is used by the PerformGet function above.

private bool TryInitializeCookies(string url, ref CookieContainer cookieContainer)
{
  // Optionally add a random element to the url if the correct macro is present

  url = url.Replace("%RAND%", DateTime.Now.Ticks.ToString());

  // Default cache policy

  HttpWebRequest.DefaultCachePolicy = new HttpRequestCachePolicy(HttpRequestCacheLevel.NoCacheNoStore);

  // If the cookie jar is empty then do a fake call to get a session cookie

  try
  {
    if (cookieContainer == null)
    {
      cookieContainer = new CookieContainer();
      var request = (HttpWebRequest)WebRequest.Create(url);

      request.CachePolicy = new HttpRequestCachePolicy(HttpRequestCacheLevel.NoCacheNoStore);
      request.CookieContainer = cookieContainer;

      using (var response = (HttpWebResponse)request.GetResponse())
      {
        int cookieCount = cookieContainer.Count;
        if (cookieCount <= 0)
          throw new InvalidOperationException("Did not receive any valid session cookies form url '" + url + "'");
      }
    }

    return true; // Success
  }
  catch (Exception)
  {
    return false;
  }
}

References

https://msdn.microsoft.com/en-us/library/system.net.webrequest.cachepolicy(v=vs.110).aspx

http://blog.technovert.com/2013/01/httpclient-caching/

http://stackoverflow.com/questions/532211/how-to-clear-the-cache-of-httpwebrequest

http://stackoverflow.com/questions/18603157/c-sharp-why-is-my-httpwebrequest-returning-a-cached-response-every-time

http://stackoverflow.com/questions/18288744/httpwebrequest-and-httpwebresponse-shows-old-data

http://stackoverflow.com/questions/4461610/wp7-httpwebrequest-without-caching

http://stackoverflow.com/questions/4230938/httpwebrequest-cachepolicy-questions-about-caching

http://stackoverflow.com/questions/31471272/how-to-disable-cache-for-asynchronous-httpwebrequest-for-windows-phone-applicati

http://stackoverflow.com/questions/3812089/c-sharp-webclient-disable-cache



Software Developer
For hire


Developer & Programmer with +15 years professional experience building software.


Seeking WFH, remoting or freelance opportunities.