When google have already indexed page 1, page 2, page 3, etc

This board is geared towards helping like-minded SEOs, Google Ads Specialists and Search Marketers find solutions!
Guest

Post by Guest »

Hi SEO people! I have a question… when google have already indexed page 1, page 2, page 3, etc (in a woocommerce page, archive page….) I used robots.txt to block the pages url’s, but since they already are indexed they will be there because when using robots google can’t “deindex” them because I prevent crawling them right? But if I knew this before google indexed the pages, I could actually use robots to prevent them from indexing…

So, what I have to do is to unblock them from robots and find another solution. I though that I could change the page from pagination to infinitive scrolling as this might prevent pages being made in the coding it self, then wait for google to crawl and deindex the page1, page3, etc, than block with robots, and then set the pages back to normal pagination.

Does this make sense? I’m not sure how to do it otherwise? Anyone?

Thanks for all answers.
Rahim

Post by Rahim »

Use url removal in gsc to remove the url from the index, and then block them.
Thane

Post by Thane »

To tell web crawlers not to index a specific URL pattern, you can create a robots.txt file with the following content:

User-agent: *
Disallow: /product/postname/

This will instruct all web crawlers (specified by the wildcard * in User-agent) not to crawl or index URLs that match the pattern /product/postname/.

However, it's important to note that the Disallow directive in the robots.txt file is meant for blocking crawlers from accessing specific content on your site, not necessarily for preventing indexing. While most well-behaved crawlers will respect the Disallow directive, it doesn't explicitly tell them not to index the content.

To explicitly instruct search engines not to index a page, you can use the noindex directive in the HTML <head> of the individual pages:

<meta name="robots" content="noindex">

If you have multiple pages with the /product/postname/ pattern, you should include the noindex meta tag in each page's HTML <head>. This way, search engines will be explicitly instructed not to index those pages, while the robots.txt file will prevent crawlers from accessing them.
Mike

Post by Mike »

You want Google indexing those pages.
Post Reply
  • Similar Topics
    Replies
    Views
    Last post