the robots.txt file
The robots.txt file file is very important, here's why. Web robots are programs that browse the Internet and search engines such as Google, Yahoo and others use to index web pages. If a robot wants to visit a website, let's say for http://web-hosting-hints.net, it will first check for http://web-hosting-hints.net/robots.txt.
The robots.txt file is also used to block spiders to parts of your website that you don't want indexed. Like for example your /admin pages, or any other information you don't want to share with search engines.
You can also block certain robots (not all) to index your website. Here are some examples of what you find in robots.txt files:
If you want to exclude all robots:
User-agent: *
Disallow: /
To allow all robots to index your pages:
User-agent: *
Disallow:
If you want to block only some directories:
User-agent: *
Disallow: /admin/
Disallow: /tmp/
Disallow: /softwares/
If you want to block only one robot: (nogoodbot)
User-agent: nogoodbot
Disallow: /
If you only want to allow Google to index your pages:
User-agent: Google
Disallow:
User-agent: *
Disallow: /
The robots.txt file is also used to block spiders to parts of your website that you don't want indexed. Like for example your /admin pages, or any other information you don't want to share with search engines.
You can also block certain robots (not all) to index your website. Here are some examples of what you find in robots.txt files:
If you want to exclude all robots:
User-agent: *
Disallow: /
To allow all robots to index your pages:
User-agent: *
Disallow:
If you want to block only some directories:
User-agent: *
Disallow: /admin/
Disallow: /tmp/
Disallow: /softwares/
If you want to block only one robot: (nogoodbot)
User-agent: nogoodbot
Disallow: /
If you only want to allow Google to index your pages:
User-agent: Google
Disallow:
User-agent: *
Disallow: /