Split a PDF by Bookmarks?

I am to process single PDFs that have each been created by 'merging' multiple PDFs. Each of the merged PDF has the places where the PDF parts start displayed with a bookmark.

Is there any way to automatically split this up by bookmarks with a script?

We only have the bookmarks to indicate the parts, not the page numbers, so we would need to infer the page numbers from the bookmarks. A Linux tool would be best.

you have programs that are built like pdf-split that can do that for you:

A-PDF Split is a very simple, lightning-quick desktop utility program that lets you split any Acrobat pdf file into smaller pdf files. It provides complete flexibility and user control in terms of how files are split and how the split output files are uniquely named. A-PDF Split provides numerous alternatives for how your large files are split - by pages, by bookmarks and by odd/even page. Even you can extract or remove part of a PDF file. A-PDF Split also offers advanced defined splits that can be saved and later imported for use with repetitive file-splitting tasks. A-PDF Split represents the ultimate in file splitting flexibility to suit every need.

A-PDF Split works with password-protected pdf files, and can apply various pdf security features to the split output files. If needed, you can recombine the generated split files with other pdf files using a utility such as A-PDF Merger to form new composite pdf files.

A-PDF Split does NOT require Adobe Acrobat, and produces documents compatible with Adobe Acrobat Reader Version 5 and above.

edit*

also found a free open sourced program Here if you do not want to pay.

Adobe Acrobat: Splitting a PDF using Bookmarks – Rocky Mountain , In the Output Destination and Filename group, do one of the following: Note: When splitting by bookmarks, if no base filename is set, the bookmark text will be used to form the file names of the resulting PDFs. To split by bookmarks: 1. On the Page Layout ribbon tab, in the Pages group, click Split. 2. Click the By bookmarks option, and enter the level or levels of bookmarks to split at. 3.

pdftk can be used to split the PDF file and extract the page numbers of the bookmarks.

To get the page numbers of the bookmarks do

pdftk in.pdf dump_data

and make your script read the page numbers from the output.

Then use

pdftk in.pdf cat A-B output out_A-B.pdf

to get the pages from A to B into out_A-B.pdf.

The script could be something like this:

#!/bin/bash

infile=$1 # input pdf
outputprefix=$2

[ -e "$infile" -a -n "$outputprefix" ] || exit 1 # Invalid args

pagenumbers=( $(pdftk "$infile" dump_data | \
                grep '^BookmarkPageNumber: ' | cut -f2 -d' ' | uniq)
              end )

for ((i=0; i < ${#pagenumbers[@]} - 1; ++i)); do
  a=${pagenumbers[i]} # start page number
  b=${pagenumbers[i+1]} # end page number
  [ "$b" = "end" ] || b=$[b-1]
  pdftk "$infile" cat $a-$b output "${outputprefix}"_$a-$b.pdf
done

Splitting By Bookmarks Using AutoSplit™ Plug-in For Adobe , Adobe Acrobat DC Choose View > Tools > Organize Pages > Open to open the Organize Pages toolbar. Click the Split button. Choose Top level bookmarks from the Split by menu, and click on Split. Use the PDF splitter to separate one or more PDFs into multiple documents by simply specifying the number of pages, file size, or top-level bookmarks. Customize your file. Mix and match, reuse pages from multiple PDF documents, or separate PDF pages to customize your file with all the form fields, comments, and links included.

There's a command line tool written in Java called Sejda where you can find the splitbybookmarks command that does exactly what you asked. It's Java so it runs on Linux and being a command line tool you can write script to do that.

Disclaimer I'm one of the authors

Split PDF, AutoSplit™ plug-in can automatically split a PDF document based on specified level of bookmarks. Every bookmark section is extracted in a separate document � Download PDF split by bookmarks for free. SplitPDF -SplitPDF.jar- is a ‘command-line driven’ Java-program, it splits a PDF-file by bookmarks into separated PDF’s. The bookmark is used as title for the newly created PDF.

Here's a little Perl program I use for the task. Perl isn't special; it's just a wrapper around pdftk to interpret its dump_data output to turn it into page numbers to extract:

#!perl
use v5.24;
use warnings;

use Data::Dumper;
use File::Path qw(make_path);
use File::Spec::Functions qw(catfile);

my $pdftk = '/usr/local/bin/pdftk';
my $file = $ARGV[0];
my $split_dir = $ENV{PDF_SPLIT_DIR} // 'pdf_splits';

die "Can't find $ARGV[0]\n" unless -e $file;

# Read the data that pdftk spits out.
open my $pdftk_fh, '-|', $pdftk, $file, 'dump_data';

my @chapters;
while( <$pdftk_fh> ) {
    state $chapter = 0;
    next unless /\ABookmark/;

    if( /\ABookmarkBegin/ ) {
        my( $title ) = <$pdftk_fh> =~ /\ABookmarkTitle:\s+(.+)/;
        my( $level ) = <$pdftk_fh> =~ /\ABookmarkLevel:\s+(.+)/;

        my( $page_number ) = <$pdftk_fh> =~ /\BookmarkPageNumber:\s+(.+)/;

        # I only want to split on chapters, so I skip higher
        # level numbers (higher means more nesting, 1 is lowest).
        next unless $level == 1;

        # If you have front matter (preface, etc) then this numbering
        # will be off. Chapter 1 might be called Chapter 3.
        push @chapters, {
            title         => $title,
            start_page    => $page_number,
            chapter       => $chapter++,
            };
        }
    }

# The end page for one chapter is one before the start page for
# the next chapter. There might be some blank pages at the end
# of the split for PDFs where the next chapter needs to start on
# an odd page.
foreach my $i ( 0 .. $#chapters - 1 ) {
    my $last_page = $chapters[$i+1]->{start_page} - 1;
    $chapters[$i]->{last_page} = $last_page;
    }
$chapters[$#chapters]->{last_page} = 'end';

make_path $split_dir;
foreach my $chapter ( @chapters ) {
    my( $start, $end ) = $chapter->@{qw(start_page last_page)};

    # slugify the title so use it as a filename
    my $title = lc( $chapter->{title} =~ s/[^a-z]+/-/gri );

    my $path = catfile( $split_dir, "$title.pdf" );
    say "Outputting $path";

    # Use pdftk to extract that part of the PDF
    system $pdftk, $file, 'cat', "$start-$end", 'output', $path;
    }

Splitting a PDF by its Bookmarks with AutoSplit™, Easily divide your PDF files into individual pages, at given page numbers, at bookmark level or into files of a given size. PDFsam Basic is free and open source� Split PDF by bookmarks. Extract chapters to separate documents based on the bookmarks in the table of contents. Online, no installation or registration required. It's free, quick and easy to use.

How can I split a pdf file and retain bookmarks? (Edit PDF), This tutorial shows you how to split a PDF based on specified bookmark levels. Entire bookmark Duration: 2:32 Posted: Feb 7, 2020 In the secondary toolbar that opens, use the Split by drop-down menu to specify if you want to split the PDF file by number of pages, maximum file size, or top-level bookmarks. In the Bodea.pdf sample file, set the number of pages to 6. If you want to create a PDF of a specific range of pages, learn more about extracting pages from a PDF.

how can i split a multipage pdf file (Acrobat Reader), I have a huge report in pdf format that has hierarchical bookmarks. I would like top split this file at the top level (parent bookmark) and should be. in adobe acrobat pdf for the split file using bookmark name as labels suffix after original file name. It is a waste of time to rename files using bookmarks as a suffix. The option to use the bookmark as label exist but not as a combination with the original name. See screenshot. I think the default label should be "_Bookmark".

Tip of the Week, Top level bookmarks: If your file has a bookmark hierarchy, Acrobat can use the top level bookmarks to split your document. This makes sense if your document� - Top level bookmarks: If your file has a bookmark hierarchy, Acrobat can use the top level bookmarks to split your document. This makes sense if your document has e.g. chapters that are represented as top level bookmarks. This way, your individual documents will contain one chapter each.

Comments
  • Any Linux programs that are similar to A-PDF Split?
  • @Jason linux.softpedia.com/get/Printing/Pdfsam-40703.shtml this is a link to pdfsam, but you can go to the main page, the second link in my post, this is supposed to be compatible with linux.
  • Nice :) I'm using grep -A1 '^BookmarkLevel: 1' | grep '^BookmarkPageNumber: ' to obtain only top-level bookmarks. Unfortunately all lower-level bookmarks get lost this way though...
  • I just wanted to mention that this bash script still works fine on macOS Sierra with pdftk. Nicely done!
  • They have limit of 200 pages.
  • No, there isn't any limit.. please open an issue if you are facing some problem.