On a recent project I ran into a problematic caching issue when using the System.Net.HttpWebRequest class to retrieve a simple JSON from a URL.
The initial GET successfully retrieved the most recent version from the webserver, however subsequent requests returned an aggressively cached version of the first GET request. This persisted even through the server version had changed (confirmed by loading the same URL in a webbrowser). If I however either restarted the application or waited for about 5 minutes a fresh copy was retrieved from the server.
Apparently this issue is the result of a mismatch between the internal settings in the WebRequest related classes and the server-side configuration that cause the server-side cache to constantly return an out-of-date response.
What doesn't work
After searching for a solution and discovering that this is a persistent problem with the WebRequest classes. I tried all possible variations of the "solutions" that I was able to find online.
These are the proposed internet solutions I tested and absolutely did not work:
- Change the server code. As the server in question is not under my control so this was not an option. link, link.
- Modify the Client Cache policies (HttpRequestCachePolicy) to bypass any local caches. link, link.
- Bypass local cache by adding a fake querystring value to the URL for every request. (e.g. ?r=
some random value
). link, link, link. - Manually deleting the WinInetCache entry that the CLR proposedly uses through P/Invoking WinInet.dll. link.
- Wait for the ~5 min required between requests so that the cache is bypassed. Nope, this is not an acceptable solution.
- Restarting the application every time. Yeah this is not an acceptable solution.
Solution
I finally found this post on StackOverflow that discusses a solution by writing your own Socket code. This set me on the right path and I successfully solved the problem by handling the socket connection manually.
This is regretfully not the ideal way to do this as implementing a HTTP/S socket connection is non-trivial and has a lot of edge/special cases that are difficult to get right. Although this renders my code below not suitable for a generic production use you can make changes to it so that all of your cases are correctly handled.
Below is the main retrieval function. It will connect the socket and perform a simple GET operation and return the raw results. Note that the results will contain all of the header data at the top and then the HTML response, so some parsing is needed.
private string GetDataUsingSocket(string url, CookieContainer cookieContainer = null )
{
// Optionally add a random element to the url if the correct macro is present
url = url.Replace("%RAND%", DateTime.Now.Ticks.ToString());
// Structured uri to be able to extract components
var uri = new Uri(url);
// Create the socket and connect it
var clientSocket = ConnectSocket(uri);
if (clientSocket == null || !clientSocket.Connected)
return null;
try
{
return PerformGet(uri, clientSocket, cookieContainer)?.ToString();
}
catch (Exception)
{
return null;
}
finally
{
clientSocket.Close();
clientSocket.Dispose();
}
}
The function is called as so:
// Normal url
GetDataUsingSocket("http://www.example.org");
// Using a random value to bypass local cache
GetDataUsingSocket("http://www.example.org?rnd=%RAND%");
Here is how the socket is connected
private Socket ConnectSocket(Uri uri)
{
// Detect the correct port based on the protocol scheme
int port = uri.Scheme
.StartsWith("https",
StringComparison.OrdinalIgnoreCase) ? 443 : 80;
IPHostEntry hostEntry;
Socket clientSocket = null;
// Resolve the server name
try
{
hostEntry = Dns.GetHostEntry(uri.Host);
}
catch (Exception)
{
return null;
}
// Attempt to connect on each address returned from DNS. Break out once successfully connected
foreach (IPAddress address in hostEntry.AddressList)
{
try
{
clientSocket = new Socket(address.AddressFamily,
SocketType.Stream,
ProtocolType.Tcp);
var remoteEndPoint = new IPEndPoint(address, port);
clientSocket.Connect(remoteEndPoint);
break;
}
catch (SocketException ex)
{
return null;
}
}
return clientSocket;
}
How the GET request is constructed and any necessary cookies are propogated
private StringBuilder PerformGet(Uri uri, Socket clientSocket, CookieContainer cookieContainer)
{
// Create the cookie string if there is one
string cookies = "";
if (cookieContainer != null && cookieContainer.Count > 0)
{
cookies = "\r\nCookie: ";
foreach (Cookie cookie in GetAllCookies(cookieContainer))
{
cookies += cookie.Name + "=" + cookie.Value + "; ";
}
}
// Format the HTTP GET request string
string getRequest = "GET " + uri.PathAndQuery +
" HTTP/1.1\r\nHost: " + uri.Host +
"\r\nConnection: Close" + cookies +
"\r\n\r\n";
var getBuffer = Encoding.ASCII.GetBytes(getRequest);
// Send the GET request to the connected server
clientSocket.Send(getBuffer);
// Create a buffer that is used to read the response
byte[] rData = new byte[1024];
// Read the response and save the ASCII data in a string
int bytesRead = clientSocket.Receive(rData);
var response = new StringBuilder();
while (bytesRead != 0)
{
response.Append(Encoding.ASCII.GetChars(rData), 0, bytesRead);
bytesRead = clientSocket.Receive(rData);
}
return response;
}
Obtaining the cookies from the CookieContainer requires a little bit of hacking, discussion.
private static CookieCollection GetAllCookies(CookieContainer cookieJar, string scheme = "https")
{
var cookieCollection = new CookieCollection();
Hashtable table = (Hashtable)cookieJar.GetType()
.InvokeMember("m_domainTable",
BindingFlags.NonPublic |
BindingFlags.GetField |
BindingFlags.Instance,
null, cookieJar, new object[]{});
foreach (string rawKey in table.Keys)
{
// Skip dots in the beginning, the key value is the domain name for the cookies
var key = rawKey.TrimStart( '.' );
// Invoke the private function to get the list of cookies
var list = (SortedList)table[rawKey].GetType()
.InvokeMember("m_list",
BindingFlags.NonPublic |
BindingFlags.GetField |
BindingFlags.Instance,
null, table[key], new object[]{});
foreach (var uri in list.Keys.Cast<string>()
.Select( listkey => new Uri(scheme + "://" + key + listkey)))
{
cookieCollection.Add(cookieJar.GetCookies(uri));
}
}
return cookieCollection;
}
Initializing the cookie container
Sometimes it is necessary to initialize session cookies and other supporting cookie information, e.g. the user might have to log in first.
To do this the function below can be used to pre-populate the CookieContainer
that is used by the PerformGet
function above.
private bool TryInitializeCookies(string url, ref CookieContainer cookieContainer)
{
// Optionally add a random element to the url if the correct macro is present
url = url.Replace("%RAND%", DateTime.Now.Ticks.ToString());
// Default cache policy
HttpWebRequest.DefaultCachePolicy = new HttpRequestCachePolicy(HttpRequestCacheLevel.NoCacheNoStore);
// If the cookie jar is empty then do a fake call to get a session cookie
try
{
if (cookieContainer == null)
{
cookieContainer = new CookieContainer();
var request = (HttpWebRequest)WebRequest.Create(url);
request.CachePolicy = new HttpRequestCachePolicy(HttpRequestCacheLevel.NoCacheNoStore);
request.CookieContainer = cookieContainer;
using (var response = (HttpWebResponse)request.GetResponse())
{
int cookieCount = cookieContainer.Count;
if (cookieCount <= 0)
throw new InvalidOperationException("Did not receive any valid session cookies form url '" + url + "'");
}
}
return true; // Success
}
catch (Exception)
{
return false;
}
}
References
https://msdn.microsoft.com/en-us/library/system.net.webrequest.cachepolicy(v=vs.110).aspx
http://blog.technovert.com/2013/01/httpclient-caching/
http://stackoverflow.com/questions/532211/how-to-clear-the-cache-of-httpwebrequest
http://stackoverflow.com/questions/18288744/httpwebrequest-and-httpwebresponse-shows-old-data
http://stackoverflow.com/questions/4461610/wp7-httpwebrequest-without-caching
http://stackoverflow.com/questions/4230938/httpwebrequest-cachepolicy-questions-about-caching
http://stackoverflow.com/questions/3812089/c-sharp-webclient-disable-cache
Developer & Programmer with +15 years professional experience building software.
Seeking WFH, remoting or freelance opportunities.