Writing A Proper Robots.txt For WordPress

Must Read

7Caps Free On-Screen Caps Lock Indicator Review

Many laptops and keyboards don't come with an indicator for your caps lock key. Num Lock is...

Are Lithium-Ion Batteries Recyclable?

Electric vehicles are growing every day, and they are certainly going to be the future of automotive...

NASA demonstrates its first all-electric aircraft

NASA has demonstrated it's first all-electric experimental which they have dubbed as the X-57 Maxwell. Its first...
Scott Hartleyhttps://www.sertmedia.com
My name is Scott Hartley I am from Nashville, TN, and love to cover topics related to the latest tech trends, social media, and cars!

If you look on the internet for a properly configured robots.txt file you will find a lot of different guides many of which have not been updated in years and many of them give you false information or are simply not blocking what should be blocked.

For instance, the Yoast approach is very minimal and has not been updated in many years, while other websites point out that you can rely on the virtual one that is provided with WordPress. However, both of these methods will lead to a poor experience for the website crawler.

WordPress does include a virtual robots.txt file which will block the very minimal including the wp-admin, but it still manages to miss a lot of the important items that should be included in it. The virtual robots.txt was meant to provide one for webmasters that would otherwise be unsure on how to make or edit one. It is the bare minimum and it is not meant for more complex websites where you will have hundreds or thousands of pages worth of content.  If you are using a more complex website than you will want to block any page that is potentially useless to the crawler so that way it focuses more on your content.

The fact is WordPress produces a lot of useless pages that are either useless to the search engine or are going to spit out a ton of errors if it tries to access them. Remember we want it to be simple for the crawler when they are on our website and we want it to focus on our content.

Below is what Yoast uses
User-Agent: *
Disallow: /wp-content/plugins/
Disallow: /out/
Disallow: /bugs/
Disallow: /suggest/
Allow: /wp-content/plugins/vipers-video-quicktags/resources/jw-flv-player/player.swf
Here is what we are using. 

User-agent: *
Disallow: /wp-login.php
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-content/
Disallow: /wp-includes/
Disallow: /trackback/
Disallow: /xmlrpc.php
Disallow: /*?s
Disallow: /search/
Allow: /wp-content/uploads/

Sitemap: http://www.thedailyexposition.com/sitemap_index.xml
Sitemap: http://www.thedailyexposition.com/post-sitemap.xml
Sitemap: http://www.thedailyexposition.com/page-sitemap.xml
Sitemap: http://www.thedailyexposition.com/category-sitemap.xml
Sitemap: http://www.thedailyexposition.com/author-sitemap.xml
Sitemap: http://www.thedailyexposition.com/forum-sitemap.xml
Sitemap: http://www.thedailyexposition.com/topic-sitemap.xml
Sitemap: http://www.thedailyexposition.com/product_cat-sitemap.xml
Sitemap: http://www.thedailyexposition.com/product_tag-sitemap.xml
Sitemap: http://www.thedailyexposition.com/post_tag-sitemap.xml
Sitemap: http://www.thedailyexposition.com/topic_tag-sitemap.xml
Sitemap: http://www.thedailyexposition.com/product-sitemap.xml

We block general access to the wp-content folder but allow the crawler to reach our uploads which will include all of our images and other files that are either used in a post or Woocommerce. Then all forms of login are also blocked, and the search is also blocked. This is because every time a query is typed WordPress creates a landing page and these are typically worthless and do nothing but waste the crawlers time. Trackbacks also serve no real purpose for the crawler to track since most WordPress installations typically disable this feature because not only does it slow your website down it wastes your server resources.

I have also taken the liberty of including not only the sitemap index but the main indexes that are included with the file, this makes it easier for the crawler to identify the sitemaps and it will check the index to match the files up. This way if you miss one it will not be missed, and it will be included.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest News

7Caps Free On-Screen Caps Lock Indicator Review

Many laptops and keyboards don't come with an indicator for your caps lock key. Num Lock is...

Are Lithium-Ion Batteries Recyclable?

Electric vehicles are growing every day, and they are certainly going to be the future of automotive technology. However, the cars are...

NASA demonstrates its first all-electric aircraft

NASA has demonstrated it's first all-electric experimental which they have dubbed as the X-57 Maxwell. Its first test flight is going to...

Tesla to launch version three of solar roof tiles on October 25, 2019

Tesla is going to launch a new version of its solar roof tiles line up tomorrow October 25, 2019.

Huawei VR Glass Will Launch This December In China

Huawei will finally be launching its own virtuality reality headset. The device has been dubbed the Huawei VR Glass. This has been...

More Articles Like This