Python: How to replace text in pdf

Python: How to replace text in pdf

I have a pdf file and i want to replace some text in pdf file and generate new pdf. How can i do that in python? I have tried reportlab , reportlab does not have any fucntion to search text and replace it. What other module can i use?


You can try Aspose.PDF Cloud SDK for Python, Aspose.PDF Cloud is a REST API PDF Processing solution. It is paid API and its free package plan provides 50 credits per month.

I'm developer evangelist at Aspose.

import os
import asposepdfcloud
from asposepdfcloud.apis.pdf_api import PdfApi

# Get App key and App SID from https://cloud.aspose.com
pdf_api_client = asposepdfcloud.api_client.ApiClient(
    app_key='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
    app_sid='xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx')

pdf_api = PdfApi(pdf_api_client)
filename = '02_pages.pdf'
remote_name = '02_pages.pdf'
copied_file= '02_pages_new.pdf'
#upload PDF file to storage
pdf_api.upload_file(remote_name,filename)

#upload PDF file to storage
pdf_api.copy_file(remote_name,copied_file)

#Replace Text
text_replace = asposepdfcloud.models.TextReplace(old_value='origami',new_value='polygami',regex='true')
text_replace_list = asposepdfcloud.models.TextReplaceListRequest(text_replaces=[text_replace])

response = pdf_api.post_document_text_replace(copied_file, text_replace_list)
print(response)

Read links such as the following: * Search and replace placeholder text in PDF with Python * How to Work With a PDF in Python – Real Python * Python: How to replace text in pdf Which can be readily found via any search engine.


Have a look in THIS thread for one of the many ways to read text from a PDF. Then you'll need to create a new pdf, as they will, as far as I know, not retrieve any formatting for you.

Sample Python code to use PDFTron SDK for searching and replacing text strings and images inside existing PDF files (e.g. business cards and other PDF templates). Unlike PDF forms, the ContentReplacer works on actual PDF content and is not limited to static rectangular annotation regions.


The CAM::PDF Perl Library can output text that's not too hard to parse (it seems to fairly randomly split lines of text). I couldn't be bothered to learn too much Perl, so I wrote these really basic Perl command line scripts, one that reads a single page pdf to a text file perl read.pl pdfIn.pdf textOut.txt and one that writes the text (that you can modify in the meantime) to a pdf perl write.pl pdfIn.pdf textIn.txt pdfOut.pdf.

#!/usr/bin/perl
use Module::Load;
load "CAM::PDF";

$pdfIn = $ARGV[0];
$textOut = $ARGV[1];

$pdf = CAM::PDF->new($pdfIn);
$page = $pdf->getPageContent(1);

open(my $fh, '>', $textOut);
print $fh $page;
close $fh;

exit;

and

#!/usr/bin/perl
use Module::Load;
load "CAM::PDF";

$pdfIn = $ARGV[0];
$textIn = $ARGV[1];
$pdfOut = $ARGV[2];

$pdf = CAM::PDF->new($pdfIn);

my $page;
   open(my $fh, '<', $textIn) or die "cannot open file $filename";
   {
       local $/;
       $page = <$fh>;
   }
close($fh);

$pdf->setPageContent(1, $page);

$pdf->cleanoutput($pdfOut);

exit;

You can call these with python either side of doing some regex etc stuff on the outputted text file.

If you're completely new to Perl (like I was), you need to make sure that Perl and CPAN are installed, then run sudo cpan, then in the prompt install "CAM::PDF";, this will install the required modules.

Also, I realise that I should probably be using stdout etc, but I was in a hurry :-)

Also also, any ideas what the format CAM-PDF outputs is? is there any doc for it?

On the PDF file, press “Ctrl+F” on your keyboard and input the text you would like to be replaced, then type in new text in the input field of Replace to modify the current one to this new text. Click on “Replace” to start replacing the texts.You may select font style, font size and other formatting options on the “Format” tab.


Thanks Yuya. The above solution worked well. Note: You need to take backup of your original file first, since it replaces your original file itself. If you want to repeatedly replace text then you can keep adding last 2 lines as below. text = text.replace(text_to_search, replacement_text) path.write_text(text) – Nages Mar 1 at 1:22


Convert Excel file to PDF with PowerShell Jan. 30, 2020 How to compress and decompress folders in ZIP format using Python Jan. 3, 2020 How to open Excel file with password in PowerShell Dec. 24, 2019 Replace string in Excel with PowerShell Dec. 20, 2019 [Python] Replace text in Excel cells Nov. 1, 2019


In this step-by-step tutorial, you'll learn how to work with a PDF in Python. You'll see how to extract metadata from preexisting PDFs . You'll also learn how to merge, split, watermark, and rotate pages in PDFs using Python and PyPDF2.