Robots.txt scanning differences between Metasploit and Nmap

Metasploit has auxiliary scanner dedicated to robots.txt files. I was interest to compare this Metasploit auxiliary scanner with Nmap robots.txt NSE script.

I decided to scan a /19 rang, how represent 8192 IP addresses with the 2 tools, compare the results and the time to do these scans.

Metasploit

By default, the Metasploit “scanner/http/robots_txt” auxiliary scanner is configured with 50 threads, you can if you want increase the number of thread  by setting the THREADS option, we have set THREADS to 256.

Metasploit, between the console, has take around 40 seconds to scan all the 8192 IP addresses, and return us 41 responses.

Example of output :

[*] [xxx.xxx.xxx.xxx] /robots.txt found
[*] [xxx.xxx.xxx.xxx] /robots.txt – /database/, /includes/, /misc/, /modules/, /sites/, /themes/, /scripts/, /updates/, /profiles/, /xmlrpc.php, /cron.php, /update.php, /install.php, /INSTALL.txt, /INSTALL.mysql.txt, /INSTALL.pgsql.txt, /CHANGELOG.txt, /MAINTAINERS.txt, /LICENSE.txt, /UPGRADE.txt, /admin/, /comment/reply/, /contact/, /logout/, /node/add/, /search/, /user/register/, /user/password/, /user/login/, /?q=admin/, /?q=comment/reply/, /?q=contact/, /?q=logout/, /?q=node/add/, /?q=search/, /?q=user/password/, /?q=user/register/, /?q=user/login/

Nmap

With Nmap, the following command will permit you to scan the robots.txt files.

time sudo nmap –script=robots.txt -p80 -PN -T4 -oN xxx.xxx.xxx.xxx_19.txt xxx.xxx.xxx.0/19

Nmap has take around 11 minutes to scan all the 8192 IP addresses, and return us 38 responses.

Example of output :

Nmap scan report for toto.sploit.com (xxx.xxx.xxx.xxx)
Host is up (0.040s latency).
PORT   STATE SERVICE
80/tcp open  http
| robots.txt: has 38 disallowed entries (15 shown)
| /database/ /includes/ /misc/ /modules/ /sites/ /themes/
| /scripts/ /updates/ /profiles/ /xmlrpc.php /cron.php /update.php
|_/install.php /INSTALL.txt /INSTALL.mysql.txt

Depending on the verbosity you give to Nmap, the complete robots.txt disallowed entries will be displayed.

In first manner we can think that Metasploit is faster than Nmap to parse all the robots.txt files. Metasploit has discover 41 robots.txt files and Nmap 38. If you take a look on the following matrices, you will see that a total of 44 robots.txt files where discovered. So 3 missed by Metasploit and 6 missed by Nmap. These missed robots.txt files are not the same between the 2 tools in most cases.

A missed robots.txt file is identified as 0 in the file, the finded one with 1. The “robots.txt” column represent the tests with a basic web browser, 1 for existing files, 0 for non existing, or accessible files.

We have case A, how is all the time missed by Nmap only. The following robots.txt entries are missed.

User-agent: *
Disallow:

An robots.txt file exist, but cause the Disallow directive don’t contain any entries, the NSE script is not matching.

We have case C, how is missed by Nmap only :

User-agent: *
Disallow: /

This case is look like case B, but Metasploit find it.

Finally we have case X, how are detected by Nmap, not detected by Metasploit, but also not accessible by a traditional web browser or wget command line. A 404 apache error code is in return but Nmap return some robots.txt entries.