Well,
10 years of blogging as a brain-damaged refugee, has had interesting repercussions.
Lots of grammatical “issues”. But the biggest issue was about 50% of the images posted had links back to the webpage. As I’m trying to migrate all this to another platform, that was a problem.
Asking how this might be done produced less than no response. So, after the success I had converting the Mealie stuff, I headed over to ChatGPT to see if it could craft a python routine to strip the data.
The challenge was creating a script that would delete a complete character string that began and ended with the same data, with different text in between.
The first script worked, but there was more html to strip, and that was fairly easy.
The challenge came in not deleting links that were not attached to photos. I realized this was going on about halfway through. So, some head scratching ensued and as suitable routine crafted. It worked great on the remainder. I had to go and re-insert the links that had been deleted with the first script.
For those of you who are curious, here is the python routine:
import re
import os
def replace_href_tags_with_nulls(input_path, output_path=None):
with open(input_path, 'r', encoding='utf-8') as file:
content = file.read()
# Replace substrings starting with <a href= and ending with >
modified_content_1 = re.sub(r'a href="https://dubea.com.*?><img src=', '<img src=', content)
modified_content = re.sub(r'</a></figure>', '</figure>', modified_content_1)
if output_path:
with open(output_path, 'w', encoding='utf-8') as file:
file.write(modified_content)
return modified_content
# Example usage
input_file = 'page.html' # Your uploaded file
output_file = 'page_cleaned.html' # Optional output file
if os.path.exists(output_file):
os.remove(output_file)
result = replace_href_tags_with_nulls(input_file, output_file)
I need to make a final pass to clean up the fat finger issues, but I’m nearly done.
Next time, I’ll talk a little about the Fuji X-Pro2
take care