How to Control Search Engine Robots

This web site requires that javascript be enabled. Click here for instructions..

How to Control Search Engine Robots

17 May

by Michael Rock

by Michael Rock
www.TheInternetPresence.com

Wouldn't it be nice to be able to leave some code in your web site to tell the search engine spider crawlers to make your site number one? Unfortunately a robots.txt file or robots meta tag won't do that, but they can help the crawlers to index your site better and block out the unwanted ones.

First a little definition explaining:

Search Engine Spiders or Crawlers - A web crawler (also known as web spider) is a program which browses the World Wide Web in a methodical, automated manner. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine, that will index the downloaded pages to provide fast searches.

A web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit. As it visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, recursively browsing the Web according to a set of policies.

Robots.txt - The robots exclusion standard or robots.txt protocol is a convention to prevent well-behaved web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website.

The robots.txt protocol is purely advisory, and relies on the cooperation of the web robot, so that marking an area of your site out of bounds with robots.txt does not guarantee privacy. Many web site administrators have been caught out trying to use the robots file to make private parts of a website invisible to the rest of the world. However the file is necessarily publicly available and is easily checked by anyone with a web browser.

The robots.txt patterns are matched by simple substring comparisons, so care should be taken to make sure that patterns matching directories have the final '/' character appended: otherwise all files with names starting with that substring will match, rather than just those in the directory intended.

Meta Tag - Meta tags are used to provide structured data about data.
In the early 2000s, search engines veered away from reliance on Meta tags, as many web sites used inappropriate keywords, or were keyword stuffing to obtain any and all traffic possible.

Some search engines, however, still take Meta tags into some consideration when delivering results. In recent years, search engines have become smarter, penalizing websites that are cheating (by repeating the same keyword several times to get a boost in the search ranking). Instead of going up rankings, these websites will go down in rankings or, on some search engines, will be kicked off of the search engine completely.

Index a site - The act of crawling your site and gathering information.

How can the robots.txt file and meta tag help you?

In the robots.txt you can tell the harmful 'web crawlers' to leave your web site alone, and give helpful hints to the ones you want to crawl your site. Here is an example on how to disallow a web crawler to search your site:

# this identifies the wayback machine
User-agent: ia_archiver
Disallow: /

ia_archiver is the crawler name for the wayback machine that you may have heard of, and the / after disallow tells ai_archiver not to index any of your site. The #message here allows you to write comments to yourself so you can keep track of what you typed.

Type the above three lines into notepad from your computer and save it to the root directory of your web site as robots.txt. Web crawlers look for this document first at a web site before doing anything else. This helps the crawler to do its job, and helps the web site owner tell the spider what to do. Say for instance you have some data that you don't want the crawlers to see. (Like duplicate content for other browser referrer pages) You can deter crawlers from indexing the 'duplicate' directory by typing this into your robots.txt file. Or if you would like to have the robots.txt file created for you, visit http://www.rietta.com/robogen.

User-agent: *
Disallow: /duplicate/

The * after user-agent says that this action applies to all crawlers and /duplicate/ after disallow tells all crawlers to ignore this directory and not search it. For each user-agent and disallow line there must be a blank space between them in order for it to function correctly. So this is how you would create the above two commands into a robots.txt file:

# this identifies the wayback machine
User-agent: ia_archiver
Disallow: /

User-agent: *
Disallow: /duplicate/

One thing to note that is very important: Anyone can access the robots.txt file of a site. So if you have information that you don't want anyone to see don't include it into the robots.txt file. If the directory that you don't want anyone to see is not linked to from your web site the crawlers won't index it anyway.

An alternative to blocking indexing of your site is to put a meta tag into the page. It looks like this: meta name="robots" content="noindex,nofollow"

You put this into the head tag of your web page. This line tells the robot crawlers not to index (search) the page and not to follow any of the hyperlinks on the page. So as an example meta name="robots" content="noindex,follow" tells the robots crawlers to not index the page, but follow the hyperlinks on this page.

Did you know that Google has its own meta tag?

It looks like this: meta name="googlebot" content="noindex,nofollow,noarchive" This tells the Google robot crawler not to index the page, not to follow any of the links, and not to keep from storing cached versions of your web site. You will want this done if you update the content on your site frequently. This prevents the web user from seeing outdated content that isn't refreshed because of storage in the cache.

You can use the meta tag to specifically talk to Google's robots to avoid complications or if you are optimizing your site for Google's search engine. This concludes this month's article.

Until the next article have a great day!

Copyright © Michael Rock
Internet Presence
www.TheInternetPresence.com
The owner of this registered company has over twenty years experience with DOS, windows business applications, numerous programming languages, artistic development, and web design. Other areas of interest include web marketing, web promoting, and business marketing and development. After the persuasion of those praising his work, he decided to go into business himself and highly suggests everyone else to do the same.

Internet Presence was founded in 2003 from a desire to become independent. Less than 1 year later Internet Presence has had accounts in three different states ranging from a locally owned auto collision repair shop to a glass packaging industry that sells its product worldwide.

Back

News Categories

General

•	SEOHost.Net Principal: A Search Filter Sidebar Has the Potential to Change the Search Engine Results Page

•	SEO Company Seeks New Content Partners for Growing Access to Publisher Websites and Premier SEO Blog & Content Writing Services

•	WebFindYou Launches FREE True Digital Marketing Master Class

Internet Related

•	Critical Security Vulnerability Discovered in All-in-One SEO Plugin Threatening Millions of WordPress Websites

•	Proactive Strategies Needed for Emerging Cyber Security Threats in 2024

•	Equally AI Announces Official Launch of Flowy--The World's First No-Code Accessibility Solution

News

•	The Launch of Presscart is Revolutionizing the PR Industry

•	Digital Marketing Agency SEO.co Expands to Salt Lake City, Utah

•	Synup Offers Free Trials for their Local Marketing Product Amidst Major Platform Upgrades

Programming

•	What is application concurrency?

•	SQL database corruption due to index fragmentation

•	The advantages of PHP

Search Engines

•	Google’s New Focus On Helpful Content Shakes up Digital Marketing’s Go-To SEO Playbook

•	SEOHost.Net Principal: Google's Recent Data Reporting Bug Demonstrates the Inherent Fragility of SEO

•	SEOHost.net Principal Discusses DuckDuckGo's Allegations Against Google

Site Promotion

•	Search Engine Journal Uncovers New Data On Industry Growth In Latest State Of SEO Report

•	TitleTap Releases Content Marketing Service To Drive Traffic To Websites

•	The Most Effective Lead Generation Channels to Leverage in 2022 [DesignRush QuickSights]

Software

•	Place1SEO Rebrands to Provide a New and Versatile Software Suite for the FES Industry

•	SEO.co Releases Backlink Checker for Understanding Competitor Backlinks

•	The Ultimate Micromarketing Tool: Turning Text into Video

Web Development

•	Top Reasons to Upgrade Your Website

•	Different keywords require different SEO tactics

•	ClickTale review - powerful web site and in page analytics - try it free

Search Promotion Data

Date / Time

Ads

News by month ‹ ›

November 2008

Mo	Tu	We	Th	Fr	Sa	Su
	1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

October 2008

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

September 2008

Mo	Tu	We	Th	Fr	Sa	Su
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

August 2008

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

July 2008

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

June 2008

Mo	Tu	We	Th	Fr	Sa	Su
	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

May 2008

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

April 2008

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

March 2008

Mo	Tu	We	Th	Fr	Sa	Su
	1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

February 2008

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29

January 2008

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

December 2008

Mo	Tu	We	Th	Fr	Sa	Su
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

December 2007

Mo	Tu	We	Th	Fr	Sa	Su
	1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

November 2007

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

October 2007

Mo	Tu	We	Th	Fr	Sa	Su
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

September 2007

Mo	Tu	We	Th	Fr	Sa	Su
	1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

August 2007

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

July 2007

Mo	Tu	We	Th	Fr	Sa	Su
	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

June 2007

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

May 2007

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

April 2007

Mo	Tu	We	Th	Fr	Sa	Su
	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

March 2007

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

February 2007

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28

January 2007

Mo	Tu	We	Th	Fr	Sa	Su
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

December 2006

Mo	Tu	We	Th	Fr	Sa	Su
	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

November 2006

October 2006

September 2006

August 2006

July 2006

June 2006

May 2006

April 2006

March 2006

February 2006

January 2006

December 2005

November 2005

October 2005

September 2005

August 2005

July 2005

June 2005

May 2005

April 2005

March 2005

February 2005

January 2005

December 2004

November 2004

October 2004

September 2004

August 2004

July 2004

June 2004

January 2009

February 2009

March 2009

April 2009

May 2009

June 2009

July 2009

August 2009

September 2009

October 2009

November 2009

December 2009

January 2010

February 2010

March 2010

April 2010

May 2010

June 2010

July 2010

August 2010

September 2010

October 2010

November 2010

December 2010

January 2011

February 2011

March 2011

April 2011

May 2011

June 2011

July 2011

August 2011

September 2011

October 2011

November 2011

December 2011

January 2012

February 2012

March 2012

April 2012

May 2012

June 2012

July 2012

August 2012

September 2012

October 2012

November 2012

December 2012

January 2013

February 2013

March 2013

April 2013

May 2013

June 2013

July 2013

August 2013

September 2013

October 2013

November 2013

December 2013

January 2014

February 2014

March 2014

April 2014

May 2014

June 2014

July 2014

August 2014

September 2014

October 2014

November 2014

December 2014

January 2015

February 2015

March 2015

April 2015

May 2015

June 2015

July 2015

August 2015

September 2015

October 2015

November 2015

December 2015

January 2016

February 2016

March 2016

April 2016

May 2016

June 2016

July 2016

August 2016

September 2016

October 2016

November 2016

December 2016

January 2017

February 2017

March 2017

April 2017

May 2017

June 2017

July 2017

August 2017

September 2017

October 2017

November 2017

December 2017

January 2018

February 2018

March 2018

April 2018

May 2018

June 2018

July 2018

August 2018

September 2018

October 2018

November 2018

December 2018

January 2019

February 2019

March 2019

April 2019

May 2019

June 2019

July 2019

August 2019

September 2019

October 2019

November 2019

December 2019

January 2020

February 2020

March 2020

April 2020

May 2020

June 2020

July 2020

August 2020

September 2020

October 2020

November 2020

December 2020

January 2021

February 2021

March 2021

April 2021

May 2021

June 2021

July 2021

August 2021

September 2021

October 2021

November 2021

December 2021

January 2022

February 2022

March 2022

April 2022

May 2022

June 2022

July 2022

August 2022

September 2022

October 2022

November 2022

December 2022

January 2023

February 2023

March 2023

April 2023

May 2023

June 2023

September 2023

November 2023

December 2023

May 2024

December 2024