Archived May, 2026.

Old topics published via WordPress embed are missing x-robots: noindex and canonical tags

Thiago_Mobilon

Hi everyone,

I’ve noticed a strange behavior with old topics that were automatically published from WordPress to Discourse (to be used as the comment section).

Normally, when a post is published this way, Discourse correctly adds the X-Robots-Tag: noindex to the HTTP header and sets the canonical URL pointing back to the WordPress blog post.

However, I discovered that older topics are losing these tags. The noindex header disappears, and the canonical tag is no longer present. Here are some examples of topics where this is happening:

Does anyone know a way to fix this issue?

Keep in mind that I have no way of knowing exactly how many topics have been affected so far, but it looks like it’s quite a few.

It would be great if there were a checkbox in the category (or tag?) settings that, when enabled, would automatically add noindex to all topics published under that category. Something like:

[ ] Hide Topics from this category in search results.

angus

Hey Thiago,

Just so I understand, are you saying the following:

  1. You have the site setting Embed set canonical URL enabled, and have always had it enabled.
  2. You have multiple topics published from your Wordpress to your Discourse via the Wordpress Discourse Plugin over a period of some time.
  3. Up until recently, all of the Discourse topics published as described in 2 all had a a link rel="canonical" with the href set to the wordpress url in their head.
  4. At some point recently, you think some subset of those topics which previously conformed to 3 now have a link rel="canonical" with the href set to the Discourse url.

Is that correct?

Thiago_Mobilon

Hi Angus!

Yes, that’s it.

The `embed set canonical URL` is enabled, too:

You can see here that new topics are published with the noindex + canonical tags. But i’m seeing old topics without these tags.

angus

Thiago, if you have access to the server, could you please get the id of a topic where the canonical url is not working, run the following in the rails console and share the result.

./launcher enter app
rails c
TopicEmbed.with_deleted.find_by(topic_id: add the topic id here)
Thiago_Mobilon
discourse(prod)> TopicEmbed.with_deleted.find_by(topic_id:73608)
=> nil
discourse(prod)> TopicEmbed.with_deleted.find_by(topic_id:79015)
=> nil
discourse(prod)> TopicEmbed.with_deleted.find_by(topic_id:74248)
=> nil
discourse(prod)> TopicEmbed.with_deleted.find_by(topic_id:76598)
=> nil
angus

That’s the issue. In order for the canonical url for embeds feature to work, the topic needs a topic_embed record. Can you think of any reason why those topics may not have embed records?

Thiago_Mobilon

I honestly don’t know what could have caused those topics to miss their topic_embed records.

But looking at the bigger picture, wouldn’t it make more sense to go with the configuration I suggested earlier? If we add a checkbox directly in the category settings to apply noindex to all topics within it, we wouldn’t have to rely on the embed feature or worry about whether those records exist in the first place.

angus

While that might make sense for your site, it would be a different feature from how topic embed canonical urls work. You could build it, but you’d have to do it as a custom plugin.

Canonical urls for embeds work as expected, however it seems at some point the embed records were deleted, or some other operation was performed on your site. Discourse doesn’t hard delete topic embed records so something else must have gone on there. Unless you do some custom work you’ll need to republish those topics to create the embed records again.

Thiago_Mobilon

While this behaves differently from topic embeds, category-level indexing control is a basic SEO requirement for any modern CMS. There are several other Meta topics about this, and making it native would solve multiple use cases at once.

I might try to build a plugin using AI since Ruby isn’t my stack, but this really should be a native feature.

Regarding the missing records: we haven’t run any database commands or operations that could cause this. Also, republishing isn’t viable. We have nearly 50,000 posts, and we don’t even know which ones are affected. Fixing this would require complex API scripts to find, delete, and republish everything…