Think about when you will read PDF files. Reading e-books, looking over company files, or filling out forms? We have to admit that PDF documents are more and more popular in our daily life.
Most people may know how to edit PDFs with a PDF Editor like PDF Reader Pro. If you are interested in how PDF editors work, you can keep reading. Here, we will talk about the easy but vital part — Search and mark up the text with C#.
We provide different methods to search text. When your users work with a PDF file with few pages, no matter what they want to find, just search the whole file. It’s easy to get what they want. But for the files with hundreds of pages. The same words would have been mentioned over dozens of times. Provide your users with a practical function of setting specific ranges before searching. Here are the methods of searching text from different ranges.
Search texts in a specific page when we want to annotate some same words in one page. Find the page you choose by object CPDFDocument
. The following method shows how to search the text content “ComPDFKit” from Page 5.
CPDFDocument document = CPDFDocument.InitWithFilePath("***");
CPDFPage page = document.PageAtIndex(4);
if (page == null)
return;
List rects = new List();
List strings = new List();
CPDFTextPage textPage = page.GetTextPage();
CPDFTextSearcher searcher = new CPDFTextSearcher();
int findIndex = 0;
if (searcher.FindStart(textPage, "ComPDFKit", C_Search_Options.Search_Case_Sensitive, findIndex))
{
CRect textRect = new CRect();
string textContent = "";
while (searcher.FindNext(page, textPage, ref textRect, ref textContent, ref findIndex))
{
strings.Add(textContent);
rects.Add(new Rect(textRect.left, textRect.top, textRect.width(), textRect.height()));
}
}
searcher.FindClose();
document.Release();
ComPDFKit can also provide methods to search text from a specific range. PageAtIndex
property could help to define the page range. When you don’t want to search text from the whole file and forget the specified page of the text, it’s wise of you to choose a page range. Let’s see how to search the text “ComPDFKit” from a page ranging from 5 to 7.
CPDFDocument document = CPDFDocument.InitWithFilePath("***");
List rects = new List();
List strings = new List();
for(int i=4;i<=6;i++)
{
CPDFPage page = document.PageAtIndex(i);
if (page == null)
continue;
CPDFTextPage textPage = page.GetTextPage();
CPDFTextSearcher searcher = new CPDFTextSearcher();
int findIndex = 0;
if (searcher.FindStart(textPage, "ComPDFKit", C_Search_Options.Search_Case_Sensitive, findIndex))
{
CRect textRect = new CRect();
string textContent = "";
while (searcher.FindNext(page, textPage, ref textRect, ref textContent, ref findIndex))
{
strings.Add(textContent);
rects.Add(new Rect(textRect.left, textRect.top, textRect.width(), textRect.height()));
}
}
searcher.FindClose();
}
document.Release();
As for the whole file, ComPDFKit PDF SDK offers developers an API for programmatic full-text search. Suppose you have a PDF document with 5 pages, And you want to search and select all the content “ComPDFKit”. Follow the methods below, and search for the text you want in C#.
CPDFDocument document = CPDFDocument.InitWithFilePath("***");
List rects = new List();
List strings = new List();
for (int i = 0; i < document.PageCount; i++)
{
CPDFPage page = document.PageAtIndex(i);
if (page == null)
continue;
CPDFTextPage textPage = page.GetTextPage();
CPDFTextSearcher searcher = new CPDFTextSearcher();
int findIndex = 0;
if (searcher.FindStart(textPage, "ComPDFKit", C_Search_Options.Search_Case_Sensitive, findIndex))
{
CRect textRect = new CRect();
string textContent = "";
while (searcher.FindNext(page, textPage, ref textRect, ref textContent, ref findIndex))
{
strings.Add(textContent);
rects.Add(new Rect(textRect.left, textRect.top, textRect.width(), textRect.height()));
}
}
searcher.FindClose();
}
document.Release();
After learning how to search text content from different pages, as a developer, you can think about the operations after searching. The aims of searching the text are to find the location of the text and to mark up the text.
When you search and get the text you want, the results can be displayed one by one. You can also get all the results, and select one of them to look over.
ComPDFKit provides types of markup for customers to highlight, underline, squiggly, etc. The following code is to show you how to search the content “ComPDFKit '' on page 5.
CPDFDocument document = CPDFDocument.InitWithFilePath("***");
List rects = new List();
CPDFPage page = document.PageAtIndex(4);
if (page == null)
return;
CPDFTextPage textPage = page.GetTextPage();
CPDFTextSearcher searcher = new CPDFTextSearcher();
int findIndex = 0;
if (searcher.FindStart(textPage, "ComPDFKit", C_Search_Options.Search_Case_Sensitive, findIndex))
{
CRect textRect = new CRect();
string textContent = "";
while (searcher.FindNext(page, textPage, ref textRect, ref textContent, ref findIndex))
{
rects.Add(textRect);
}
}
searcher.FindClose();
Follow the methods below to highlight the text:
CPDFHighlightAnnotation highlight = page.CreateAnnot(C_ANNOTATION_TYPE.C_ANNOTATION_HIGHLIGHT) as CPDFHighlightAnnotation;
byte[] color = { 0, 255, 0 };
highlight.SetColor(color);
highlight.SetTransparency(120);
highlight.SetQuardRects(rects);
highlight.UpdateAp();
document.Release();
There are more useful operations for users to process their PDF documents. Click the following links for more features of ComPDFKit.
Redact: Remove sensitive/personal information irreversibly in PDF files.
We will keep developing the practical PDF features and bringing our customers what they need. Of course, we’d like to share the technologies of PDF continuously. For more about our ComPDFKit, please click here.