Using python to delete specified text from thousands of old blog posts

I’m a web editor that uses WordPress and my site has a bit of an annoying problem. We basically have tens of thousands of articles going back about ten years and we have to delete all of the images we posted in articles from around 2012 to 2018. The reason is that the then editor of the website had a bad habit of using Creative Commons images and not attributing them correctly so we’re now vulnerable to legal action. I batch deleted all the actual images from our media library but that still leaves random bits of image attribution/text sitting in old articles and it all looks a complete mess. It took me about a day just to go through one month of old articles to correct this.

Anyway, before I get forced to throw myself out a window and end the misery, I wondered if conceptually speaking it might be possible to write some python code to automate this process. Basically what is required is a program that can go through every article we published between 2012 and 2018, identify sections of text (all the image attributions start with "Credit:") and then delete all this text. I’m a novice with python and I’ve just started thinking about this but I just wondered if anyone with more experience thinks this is at least possible. I honestly think it will take less time for me to learn python and do this than it will to manually go through each article deleting everything due to the volume of content there is on this site.

$299 Affordable Web Design WordPress

This article was republished from its original source.
Call Us: 1(800)730-2416

Pixeldust is a 20-year-old web development agency specializing in Drupal and WordPress and working with clients all over the country. With our best in class capabilities, we work with small businesses and fortune 500 companies alike. Give us a call at 1(800)730-2416 and let’s talk about your project.

FREE Drupal SEO Audit

Test your site below to see which issues need to be fixed. We will fix them and optimize your Drupal site 100% for Google and Bing. (Allow 30-60 seconds to gather data.)

Powered by

Using python to delete specified text from thousands of old blog posts

On-Site Drupal SEO Master Setup

We make sure your site is 100% optimized (and stays that way) for the best SEO results.

With Pixeldust On-site (or On-page) SEO we make changes to your site’s structure and performance to make it easier for search engines to see and understand your site’s content. Search engines use algorithms to rank sites by degrees of relevance. Our on-site optimization ensures your site is configured to provide information in a way that meets Google and Bing standards for optimal indexing.

This service includes:

  • Pathauto install and configuration for SEO-friendly URLs.
  • Meta Tags install and configuration with dynamic tokens for meta titles and descriptions for all content types.
  • Install and fix all issues on the SEO checklist module.
  • Install and configure XML sitemap module and submit sitemaps.
  • Install and configure Google Analytics Module.
  • Install and configure Yoast.
  • Install and configure the Advanced Aggregation module to improve performance by minifying and merging CSS and JS.
  • Install and configure Schema.org Metatag.
  • Configure robots.txt.
  • Google Search Console setup snd configuration.
  • Find & Fix H1 tags.
  • Find and fix duplicate/missing meta descriptions.
  • Find and fix duplicate title tags.
  • Improve title, meta tags, and site descriptions.
  • Optimize images for better search engine optimization. Automate where possible.
  • Find and fix the missing alt and title tag for all images. Automate where possible.
  • The project takes 1 week to complete.