Updated on 23/10/2009: I’ve updated the code to fix the bogus regex expressions (wordpress editor striped some bits from them) and also to fix the code since Google changed the signin process slightly. For the more interrested, Google now has 2 cookies/hidden vars instead of 1, the code is almost the same though.
A simple task like signing in into Google hardly seems to take more than a few lines and a little time of research, specially with so much talk about Google web services and APIs all the time. The truth is that it’s not always that simple, Google has APIs for some services and not others, and the authentication API’s are limited to those APIs most of the time. In my case, trying to connect to AdSense, there wasn’t a suitable API unless I wished to use the service on a website, had 100.000 daily pageviews and to fully comply with the EULA I had to implement things I didn’t need.
Google search proved almost useless this time too. Searching for “something AdSense” will undoubtedly return you a lot of pages using AdSense advertising but not quite what you really wanted to search about. The few exceptions, with tweaked search terms, returned always old and broken pieces of code in all sorts of programming languages (no C# code tough…). So I went to do some Http spying using Firebug and HttpFox.
Working with Http on C# is usually quite pleasant, that is, if you don’t have to handle cookies and “POST” requests and that’s where your life can get a bit more complicated. So, after much struggling so mich I feel that the quite small piece of code doesn’t really make justice to the time and complexity I spent into this solution:
public class AdSenseScraper { protected WebClient webclient = new CookieAwareWebClient(); // Google Urls static protected Uri checkLoggedInUri = new Uri("https://www.google.com/adsense"); static protected Uri loginPageUri = new Uri("https://www.google.com/accounts/ServiceLoginBox?service=adsense<mpl=login&ifr=true&rm=hide&fpui=3&nui=15&alwf=true&passive=true&continue=https%3A%2F%2Fwww.google.com%2Fadsense%2Flogin-box-gaiaauth&followup=https%3A%2F%2Fwww.google.com%2Fadsense%2Flogin-box-gaiaauth&hl=en_US"); static protected Uri loginPostFormUri = new Uri("https://www.google.com/accounts/ServiceLoginBoxAuth"); static protected Uri checkCookieUri = new Uri("https://www.google.com/accounts/CheckCookie?continue=https%3A%2F%2Fwww.google.com%2Fadsense%2Flogin-box-gaiaauth&followup=https%3A%2F%2Fwww.google.com%2Fadsense%2Flogin-box-gaiaauth&hl=en_US&service=adsense<mpl=login&chtml=LoginDoneHtml"); public bool SignIn(string username, string password) { byte[] response; string ga3t, newUrl, galx; // Get initial cookie and GA3T value response = webclient.DownloadData(loginPageUri); ga3t = Regex.Match(webclient.ResponseHeaders["Set-Cookie"], "GA3T=(?<" + "ga3t" + ">[^;]*)").Groups["ga3t"].Value; galx = Regex.Match(webclient.ResponseHeaders["Set-Cookie"], "GALX=(?<" + "galx" + ">[^;]*)").Groups["galx"].Value; if (String.IsNullOrEmpty(ga3t)) return false; webclient.Headers.Add("Content-Type", "application/x-www-form-urlencoded"); response = webclient.UploadData(loginPostFormUri, "POST", Encoding.UTF8.GetBytes("continue=https%3A%2F%2Fwww.google.com%2Fadsense%2Flogin-box-gaiaauth&followup=https%3A%2F%2Fwww.google.com%2Fadsense%2Flogin-box-gaiaauth&service=adsense&nui=15&fpui=3&ifr=true&rm=hide<mpl=login&hl=en_US&alwf=true<mpl=login&GA3T=" + ga3t + "&GALX=" + galx + "&Email=" + HttpUtility.UrlEncode(username) + "&Passwd=" + HttpUtility.UrlEncode(password) + "&null=Sign+in")); webclient.Headers.Remove("Content-Type"); response = webclient.DownloadData(checkCookieUri); newUrl = Regex.Match(HttpUtility.HtmlDecode(Encoding.UTF8.GetString(response)), "url='(?< " + "url" + ">.*)'").Groups["url"].Value; if (String.IsNullOrEmpty(newUrl)) return false; // New url should be a redirect to SetID which finalizes the login process response = webclient.DownloadData(newUrl); return true; } }
By the way, I needed to use a special implementation of WebClient to be able to have the Cookies correctly persisted for “free”. Here’s the special class I copied from somewhere around:
class CookieAwareWebClient : WebClient { private CookieContainer m_container = new CookieContainer(); protected override WebRequest GetWebRequest(Uri address) { WebRequest request = base.GetWebRequest(address); if (request is HttpWebRequest) ((HttpWebRequest)request).CookieContainer = m_container; return request; } }
Well, thats the code and its quite self-explanatory. I hope to do a nice GNOME applet to display AdSense stats now. Also, I did a first (for me), by writing a complete Objective C port that I won’t publish (not today at least) since its an easy 1 on 1 porting and right now I feel really emotionally attached to it… LOL
Tags: AdSense, C#, Cookies, HttpFox, Login, Objective C, Sign in, WebClient
You are rocking dude, really a nice post, I will write about
this soon with trackback on my own blog of Adsense Help too.
Hey
I found this information
extremely useful and managed to use it as a base for an Adsense checking app. It was all working fine till today! Something has stopped working – I presume google may have changed something?
Do
you have any ideas what?
Thanks
I tried to run you code and I get the following exception:
System.ArgumentException was unhandled
Message=”parsing \”GA3T=(?.*)\” – Unrecognized grouping construct.”
Source=”System”
StackTrace:
at System.Text.RegularExpressions.RegexParser.ScanGroupOpen()
at System.Text.RegularExpressions.RegexParser.ScanRegex()
at System.Text.RegularExpressions.RegexParser.Parse(String re, RegexOptions op)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options, Boolean useCache)
at System.Text.RegularExpressions.Regex.Match(String input, String pattern)
at Microsoft.Samples.MediaCatalog.Form1.GetAdSense() in C:\Work\Manager\Form1.cs:line 260
at Microsoft.Samples.MediaCatalog.Form1..ctor(Boolean bPost) in C:\Work\Manager\Form1.cs:line 335
at Microsoft.Samples.MediaCatalog.Program.Main(String[] args) in C:\Work\Manager\Program.cs:line 31
at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException:
Any idea?
Thanks
I can’t do work this code. Please help me!
Google changed their login structure, and this code does not work anymore. Please advise on proper modifications. A PHP version (WITH changes made) has been posted here: http://www.garyshood.com/conkyadsense/ and you if scroll down to the comments you can see the specific changes – but I am PHP-retarded. Can anybody offer advice on how to modify THIS code? Thanks
Hi,
I’ve updated the code to fix the bogus regex expressions (wordpress editor striped some bits from them) and also to fix the code since Google changed the signin process slightly. For the more interrested, Google now has 2 cookies/hidden vars instead of 1, the code is almost the same though.
Let me know if there are still problems.
I also am getting an unrecognized grouping construct error:
Server Error in ‘/’ Application.
parsing “url='(?.*)'” – Unrecognized grouping construct.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.ArgumentException: parsing “url='(?.*)'” – Unrecognized grouping construct.
Source Error:
Line 53:
Line 54: response = webclient.DownloadData(checkCookieUri);
Line 55: newUrl = Regex.Match(HttpUtility.HtmlDecode(Encoding.UTF8.GetString(response)), “url='(?.*)'”).Groups[“url”].Value;
Line 56:
Line 57: if (String.IsNullOrEmpty(newUrl))
Thanks!
-Jason
Its the dam wordpress editor, any text similar to ? just gets the striped out and in this case its the grouping constructors… Anyway, I’ve fixed it like I did for the ga3t and galx so you can just check the updated line 30.
Two additions:
– Loop through all the hidden elements and add the values to the postData
string postData = String.Empty;
foreach (string input in ExtractTagArray(response, “input”))
if (input.IndexOf(“hidden”) > 0)
postData += ((postData.Length > 0) ? “&” : “”) +
GetValue(input, “name”) + “=” + HttpUtility.UrlEncode(GetValue(input, “value”));
postData += “&Email=” + HttpUtility.UrlEncode(username) +
“&Passwd=” + HttpUtility.UrlEncode(password);
– Remove line 29
The two extra methods I am using:
private ArrayList ExtractTagArray(string source, string tag)
{
ArrayList Results = new ArrayList();
string Pattern = String.Format(@”]*>”, tag);
Regex regex = new Regex(Pattern, RegexOptions.IgnoreCase);
foreach (Match CurMatch in regex.Matches(source))
Results.Add(CurMatch.Value);
return Results;
}
private string GetValue(string source, string tagName)
{
Regex regex = new Regex(tagName + “\\s*=\\s*((?:\\\”(?[^\\\”]*)\\\”)|(?:'(?[^’]*)’)|(?[^\\s|>]*))”, RegexOptions.IgnoreCase);
return regex.IsMatch(source) ? regex.Match(source).Groups[“value”].Value : String.Empty;
}