Posted by Royh

Planning and executing SEO strategies for sites with hundreds of millions of pages is no easy task, but there are strategies to make it simpler.

Programmatic pages are pages that have been generated automatically on a very large scale. SEO strategies for these pages are used to target multiple keyword variations by creating landing pages at that scale automatically.

You’ll typically find these pages in major verticals like e-commerce, real estate, travel, and informational sites. These verticals are relying on programmatic pages to build their SEO strategy, and they have a dedicated page for each product and category. This set up can lead to hundreds of millions of pages — they’re efficient, functional, and user-friendly, however, they do come with some SEO challenges.

In my experience, the comprehensive SEO strategy covered in this post works best when tailored to fit a large site with programmatic pages. Many strategies that typically work for sites with only a few hundred pages won't necessarily get the same results on larger sites. Small sites rely on manual and meticulous content creation, compared to programmatic pages, which are the main traffic-driving pages of the site.

So, let’s get down to business! I’ll explore the four major SEO challenges you'll encounter when dealing with programmatic pages, and unpack how to overcome them.

1. Keyword research and keyword modifiers

Well-planned keyword research is one of the biggest challenges when operating on a programmatic scale. When working on a sizable set of pages and keywords, it’s important to choose and find the right keywords to target across all pages.

In order to function both efficiently and effectively, it’s recommended that you divide site pages into a few templates before digging into the research itself. Some examples of these templates could include:

  • Categories
  • Sub-categories
  • Product pages
  • Static pages
  • Blogs
  • Informational pages
  • Knowledge base/learning

Once you have all the page templates in place, it's time to build keyword buckets and keyword modifiers.

Keyword modifiers are additional keywords that, once you combine them with your head terms and core keywords, help with long tail strategy. For example, modifiers for the head term “amazon stock” can be anything related to market share, statistics, insights, etc.

Programmatic pages typically hold the majority of the site's pages. (Take Trulia, for example, which has over 30 million indexed pages — the majority of which are programmatic.) As a result, those pages are usually the most important on a larger website, both in terms of volume and search opportunity. Thus, you must ensure the use of the right keyword modifiers across each page template’s content.

Of course, you can’t go over every single page and manually modify the SEO tags. Imagine a website like Pinterest trying to do that — they’d never finish! . On a site with 30-100 million pages, it’s impossible to optimize each one of them individually. That's why it’s necessary to make the changes across a set of pages and categories — you need to come up with the right keyword modifiers to implement across your various page templates so you can efficiently handle the task in bulk.

The main difference here, compared to typical keyword research, is the focus on keyword modifiers. You have to find relevant keywords that can be repeatedly implemented across all relevant pages.

Let's take a look at this use case on a stock investment website:

The example above shows a website that is targeting users/investors with informational intent, and that relies on programmatic pages for the SEO strategy. I found the keyword modifier by conducting keyword research and competitor research.

I researched several relevant, leading websites using Moz’s Keyword Explorer and SimilarWeb’s Search Traffic feature, and noted the most popular keyword groups. After I’d accumulated the keyword groups, I found the search volume of each keyword to determine which ones would be the most popular and relevant to target

Once you have the keyword modifiers, you must implement them across the title tags, descriptions, headline tags, and content on the page template(s) the modifiers are for. Even when you multiply this strategy by millions of pages, having the right keyword modifier makes updating your programmatic pages a much easier process and much more efficient.

If you have a template of pages ordered by a specific topic, you'll be able to update and make changes across all the pages with that topic, for example, a stock information site with a particular type of stock page, or a category with stocks based on a price/industry. One update will affect all the pages in the same category, so if you update the SEO title tag of the template of a stock page, then all pages in the same category will be updated as well.

In the example above, the intent of the keywords is informational. Keyword intent focuses on how to match search intents to keyword modifiers. We’re targeting searchers who are looking to gather certain insights. They want more information regarding stocks or companies, market caps, expert evaluations, market trends, etc. In this case, it's recommended to add additional keywords that will include questions such as “how?”, “what?”, and “which?”.

As another example, transactional keywords — which are a better fit for e-commerce and B2C websites — are highly effective for addressing searches with purchase intent. These terms can include “buy”, “get”, “purchase”, and “shop”.

2. Internal linking

Smart internal linking plans are vital for large sites. They have the ability to significantly increase the number of indexed pages, then pass link equity between pages. When you work on massive sites, one of your main priorities should be to make sure Google will discover and index your site’s pages.

So, how should you go about building those internal linking features?

When looking at the big picture, the goal is that Page A will link to Page B and Page C, while Page B will link to Page D and Page E, etc. Ideally, each page will get at least one link from a different indexed page on the site. For programmatic sites, the challenge here is the fact that new pages emerge on a daily basis. In addition to the existing pages, it’s imperative to calculate and project so that you can jumpstart internal linking for the new pages. This helps these pages get discovered quickly and indexed in the proper fashion.

Related pages and “people also viewed”

One strategy that makes link building easier is adding a "related pages" section to the site. It adds value for the user and the crawlers, and also links to relevant pages based on affinity.

You can link to similar content based on category, product type, content, or just about any other descriptive element. Similar content should be sorted in numeric order or alphabetical order.

HTML sitemap

Yes, even large websites are using HTML sitemaps to help crawlers find new pages. They’re extremely effective when working on large scale sites with millions of pages.

Let’s take a look at this example from the Trulia HTML sitemap (shown above): Trulia built their HTML sitemap based on alphabetical order, and in a way that ensures all pages have links. This way, there won't be any orphan pages, which helps their goal of supplying link juice to all pages that they wish to index.

In general, many e-commerce and real estate websites are sequencing their sitemaps by alphabetical/categorical order to guarantee that no page will be alone.

3. Crawl budget and deindexing rules

Crawl budget is a very important issue that large websites need to consider. When you have tens of millions of programmatic pages, you need to make sure Google consistently finds and indexes your most valuable pages. The value of your pages should be based on content, revenue, business value, and user satisfaction.

First, choose which pages should not be indexed:

  1. Use your favorite analysis tool to discover which pages have the lowest engagement metrics (high bounce rates, low averages of time on site, no page views, etc.).
  2. Use the search console to discover which pages have high impressions and low CTRs.
  3. Combine these pages into one list.
  4. Check to see if they have any incoming links.
  5. Analyze the attribution of those pages to revenue and business leads.
  6. Once you have all of the relevant data and you choose the pages that should be removed from index, add no-index tag to all of them and exclude them from sitemap XML.

I work for SimilarWeb, a website with over 100 million pages, and I ran a no-index test on over 20 million pages based on the checklist above. I wanted to see the impact of removing a high number pages from our organic channel.

The results were incredible.

Although we lost over half a million visits over the course of a month, the overall engagement metrics on programmatic pages improved dramatically.



By removing irrelevant pages, I made more room for relevant and valuable pages to be discovered by the Google bot.

Rand Fishkin also has a really comprehensive checklist, which shows you how to determine if a page is low quality according to Google. Another great example is Britney Muller’s experiment, where she deindexed 75% of Moz’s pages with great results.

4. SEO split testing

Test everything! The advantage when working on a large scale SEO campaign is that you have access to big data and can utilize it for your SEO efforts. Unlike regular A/B testing, which tests human behavior, A/B split testing is purely for crawlers.

The split testing process is usually based on the same or similar templates of pages. Split the page into two or three groups — one group acts as a control, while the other groups are enabled. Test the following criteria:

  • Adding structured data
  • Changing the keyword modifier of SEO tags (title tag, description, H tags, etc.)
  • Image ALT tags
  • Content length
  • Page performance
  • Internal linking

In terms of measuring the performance, I recommend using one experiment at a time. For instance, you might adjust SEO tags first, and then continue testing other verticals after you’ve built some confidence.

Diving into a split testing example, let’s look at Etsy. Etsy wanted to test which title tag would rank higher and drive better CTR, and generally improve the organic traffic to the pages that were tested. In the image below, we can see how they performed the split test between control pages with default title tags against test pages with different tag variations in this article

Pinterest’s dashboard also shows how their growth team relies on split testing experiments for their SEO strategy. Pinterest’s goal was to build an experimentation tool that would help them measure the impact of SEO changes to their rankings and organic traffic.

Now it’s your turn

Since programmatic pages are different from most others, it’s imperative that you build and optimize these pages in the right way. This requires several adjustments from your normal SEO strategy, along with the application of new and proprietary strategies. The benefit of using the approach outlined above is the incremental scale with which you can contribute to your business.

Programmatic page searches are supposed to fit the search query, whether it’s by product search, address, or information. This is why it’s crucial to make sure the content is as unique as possible, and that the user will have the best answer for each query.

Once you grasp the four tactics above, you’ll be able to implement them into your SEO strategy and begin seeing better results for your programmatic pages.


Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!