Python 3 script to create an RSS feed valid for Google News
The goal of this tutorial is to create an RSS.xml file suitable for being uploaded to Google News.
Why is it useful to have an RSS.xml file? Mainly because it automates publication tasks. Starting from this file, you can generate automated tweets, Facebook posts or other publication workflows.
Repository: https://github.com/al118345/rss-python. Video: https://www.youtube.com/watch?v=k8mVioEJLL8
rss.xml structure
A good reference is the New York Times world RSS feed: https://rss.nytimes.com/services/xml/rss/nyt/World.xml
Its structure is similar to the snippet shown above. Of all these elements, the main fields we need to fill are:
| Required elements | Function |
|---|---|
| title | Contains the RSS channel title. |
| link | Contains the website URL. |
| description | Contains the RSS channel description. |
Implementation
With this example and all the previous information, the repository https://github.com/al118345/rss-python was created. It contains the following code:
As you can see, the general idea is simple. First, a static section is filled with the website name, email and other identifying information. Those values are later used to identify the website and its author.
After that, the script loads a CSV document with the following structure:
| title | url | topic |
|---|---|---|
| Bayesian network fundamentals | https://1938.com.es/redes-bayesianas | mathematics |
| Introduction to MongoDB. Document query examples. | https://1938.com.es/mongodb | mongodb nosql |
The structure is intentionally simple: three columns with the title, URL and topic. This file is read by the script to generate the different feed entries automatically.
What Google News expects from the feed
A feed should not be treated as a random list of links. Google News and other readers use it as a structured signal about what has changed on the site, which URL is canonical, when an item was published and whether the entry belongs to a recognizable editorial source. For that reason, the script should generate stable URLs, meaningful titles, clean descriptions and dates in a standard format.
The most common mistake is to create a valid XML document that is still poor from an editorial point of view. If every item has a generic description, if several titles are almost identical or if the feed points to pages that return redirects or thin content, the feed will be technically correct but weak for discovery.
Validation checklist before publishing
- Open the generated XML in a browser and check that it has no escaping errors or broken characters.
- Verify that every link returns HTTP 200 and uses the same canonical URL that appears in the page HTML.
- Use a descriptive channel title, a real site link, language metadata and an updated build date.
- Avoid duplicate items: each article should appear once, with one permanent URL and one clear topic.
- Keep the CSV source under version control so feed changes can be reviewed like any other content change.
Publication workflow
In a real project, this script can run after publishing a new article. The usual flow is: update the CSV or content database, regenerate the RSS file, upload it with the static assets, request a crawl if the article is important and monitor Search Console for indexing problems. This does not force Google to index a page, but it gives crawlers a clean and consistent discovery path.
It also connects well with other automation tasks: an RSS entry can feed a newsletter, a social post queue or a small internal dashboard that checks whether recent articles have title, description, canonical and sitemap coverage. The important part is that the RSS should reflect the public website, not become a separate source of truth with different URLs or summaries.
Another useful improvement is to add automated validation before writing the final file. The script can reject empty titles, relative URLs, missing topics or duplicated links before generating XML. This prevents the feed from publishing low-quality entries that later have to be removed from Google News or Search Console.
If the feed is generated from a CMS or a static-site build, keep the same publication rules as the sitemap: only include pages that are indexable, canonical and useful for readers. Tag archives, temporary URLs, search results and tests should not be mixed with editorial articles. Clean discovery signals are boring, but they are exactly what crawlers need.
A final practical check is to compare the RSS, sitemap and visible article list. Important URLs should appear in the three places with the same final address. If a page is only present in the feed but not internally linked, it may still be discovered, but it sends a weaker quality signal than an article connected from related content.
Related tutorials: send emails from a Python API, collect tweets with Tweepy and Angular Universal SEO basics.
The final generated RSS looks like this: