Web Server Robot.txt Information Disclosure

Introduction

The remote server contains a file named 'robots.txt' that is intended to prevent web 'robots' from visiting certain directories in a website for maintenance or indexing purposes. A malicious user may also be able to use the contents of this file to learn of sensitive documents or directories on the affected site and either retrieve them directly or target them for other attacks.

How to Test

We can find robots.txt file publicly. Just type robots.txt file after the URL and check whether it is available or not.

How to Fix

Review the contents of the site's robots.txt file, use Robots META tags instead of entries in the robots.txt file, and/or adjust the web server's access controls to limit access to sensitive material.

HTML

<meta name="robots" content="noindex, nofollow" >

PHP

header('X-Robots-Tag: noindex, nofollow');

References

https://www.searchenginejournal.com/best-practices-setting-meta-robots-tags-robots-txt/188655/#close
https://searchengineland.com/google-to-stop-supporting-noindex-directive-in-robots-txt-319003

PreviousCross Site Scripting NextSSL Related Issues

Last updated 5 years ago

Was this helpful?