How to Search & Mark up Text in PDFs with C#

Tutorials | Mark up · Search Text · How to · C# Fri. 26 Aug. 2022

Think about when you will read PDF files. Reading e-books, looking over company files, or filling out forms? We have to admit that PDF documents are more and more popular in our daily life.

 

Most people may know how to edit PDFs with a PDF Editor like PDF Reader Pro. If you are interested in how PDF editors work, you can keep reading. Here, we will talk about the easy but vital part — Search and mark up the text with C#. 



Search & Select Text

 

We provide different methods to search text. When your users work with a PDF file with few pages, no matter what they want to find, just search the whole file. It’s easy to get what they want. But for the files with hundreds of pages. The same words would have been mentioned over dozens of times. Provide your users with a practical function of setting specific ranges before searching. Here are the methods of searching text from different ranges.

 

From a Specific Page

 

Search texts in a specific page when we want to annotate some same words in one page. Find the page you choose by object CPDFDocument. The following method shows how to search the text content “ComPDFKit” from Page 5.

 

CPDFDocument document = CPDFDocument.InitWithFilePath("***");
CPDFPage page = document.PageAtIndex(4);

if (page == null)
    return;

List rects = new List();
List strings = new List();

CPDFTextPage textPage = page.GetTextPage();
CPDFTextSearcher searcher = new CPDFTextSearcher();
int findIndex = 0;

if (searcher.FindStart(textPage, "ComPDFKit", C_Search_Options.Search_Case_Sensitive, findIndex))
{
    CRect textRect = new CRect();
    string textContent = "";
    while (searcher.FindNext(page, textPage, ref textRect, ref textContent, ref findIndex))
    {
        strings.Add(textContent);
        rects.Add(new Rect(textRect.left, textRect.top, textRect.width(), textRect.height()));
    }
}
searcher.FindClose();
document.Release();

 

From a Specific Range

 

ComPDFKit can also provide methods to search text from a specific range. PageAtIndex property could help to define the page range. When you don’t want to search text from the whole file and forget the specified page of the text, it’s wise of you to choose a page range. Let’s see how to search the text “ComPDFKit” from a page ranging from 5 to 7.

 

CPDFDocument document = CPDFDocument.InitWithFilePath("***");
List rects = new List();
List strings = new List();

for(int i=4;i<=6;i++)
{
    CPDFPage page = document.PageAtIndex(i);
    if (page == null)
        continue;

    CPDFTextPage textPage = page.GetTextPage();
    CPDFTextSearcher searcher = new CPDFTextSearcher();
    int findIndex = 0;

    if (searcher.FindStart(textPage, "ComPDFKit", C_Search_Options.Search_Case_Sensitive, findIndex))
    {
        CRect textRect = new CRect();
        string textContent = "";
        while (searcher.FindNext(page, textPage, ref textRect, ref textContent, ref findIndex))
        {
            strings.Add(textContent);
            rects.Add(new Rect(textRect.left, textRect.top, textRect.width(), textRect.height()));
        }
    }
    searcher.FindClose();
}
document.Release();

 

From the Whole Document

 

As for the whole file, ComPDFKit PDF SDK offers developers an API for programmatic full-text search. Suppose you have a PDF document with 5 pages, And you want to search and select all the content “ComPDFKit”. Follow the methods below, and search for the text you want in C#.

 

CPDFDocument document = CPDFDocument.InitWithFilePath("***");
List rects = new List();
List strings = new List();

for (int i = 0; i < document.PageCount; i++)
{
    CPDFPage page = document.PageAtIndex(i);
    if (page == null)
        continue;

    CPDFTextPage textPage = page.GetTextPage();
    CPDFTextSearcher searcher = new CPDFTextSearcher();
    int findIndex = 0;

    if (searcher.FindStart(textPage, "ComPDFKit", C_Search_Options.Search_Case_Sensitive, findIndex))
    {
        CRect textRect = new CRect();
        string textContent = "";
        while (searcher.FindNext(page, textPage, ref textRect, ref textContent, ref findIndex))
        {
            strings.Add(textContent);
            rects.Add(new Rect(textRect.left, textRect.top, textRect.width(), textRect.height()));
        }
    }
    searcher.FindClose();
}
document.Release();

 

 

The Operations After Text Search

 

After learning how to search text content from different pages, as a developer, you can think about the operations after searching. The aims of searching the text are to find the location of the text and to mark up the text.

 

When you search and get the text you want, the results can be displayed one by one. You can also get all the results, and select one of them to look over. 

 

ComPDFKit provides types of markup for customers to highlight, underline, squiggly, etc. The following code is to show you how to search the content “ComPDFKit '' on page 5.

 

CPDFDocument document = CPDFDocument.InitWithFilePath("***");
List rects = new List();

CPDFPage page = document.PageAtIndex(4);
if (page == null)
    return;

CPDFTextPage textPage = page.GetTextPage();
CPDFTextSearcher searcher = new CPDFTextSearcher();
int findIndex = 0;

if (searcher.FindStart(textPage, "ComPDFKit", C_Search_Options.Search_Case_Sensitive, findIndex))
{
    CRect textRect = new CRect();
    string textContent = "";
    while (searcher.FindNext(page, textPage, ref textRect, ref textContent, ref findIndex))
    {
        rects.Add(textRect);
    }
}
searcher.FindClose();

 

Follow the methods below to highlight the text: 

CPDFHighlightAnnotation highlight = page.CreateAnnot(C_ANNOTATION_TYPE.C_ANNOTATION_HIGHLIGHT) as CPDFHighlightAnnotation;
byte[] color = { 0, 255, 0 };
highlight.SetColor(color);
highlight.SetTransparency(120);
highlight.SetQuardRects(rects);
highlight.UpdateAp();

document.Release();

 

 

Related Features

 

There are more useful operations for users to process their PDF documents. Click the following links for more features of ComPDFKit. 

  • OCR: Help with the scanned PDFs & images that you can’t search for text.
  • Redact: Remove sensitive/personal information irreversibly in PDF files.

 

 

Final Words

 

We will keep developing the practical PDF features and bringing our customers what they need. Of course, we’d like to share the technologies of PDF continuously. For more about our ComPDFKit, please click here.

Ready to Get Started?

Download our all-in-one ComPDFKit for free and run it to your project within minutes!