Regular Expressions for extracting Metadata

Updated on Dec 18, 2025

In this article, you will learn

which Regular Expressions are useful for extracting information from file names, and
how these should be structured for different examples.

1. Introduction

Regular Expressions can be used to extract metadata directly from the file name. If the file name contains recurring information, such as dimensions, the number of copies, or the Bleed, these values can be automatically recognized and transferred to the metadata of the Print Item. The extracted information is then available at the corresponding locations in the application for further processing.

2. Search and Extract

As with most tasks, there are many possible approaches when working with regular expressions. The following examples illustrate how to define regular expressions for extracting metadata.

Example 1

Structure of the file name: Craft Beer Magenta 400x1000mm_#2000_B2

The following information can be extracted from this file name:

The file has a width of 400 mm.
The file has a height of 1000 mm.
2000 copies of the print file are to be produced.
The file must include a Bleed of 2 mm.

The following regular expressions can be defined to extract the correct information:

\d{2,3}(?=x) – search for the width of the Print Item:

matches two- to three-digit numbers
followed by an x.

\d{2,5}(?=m) – search for the height of the Print Item:

matches two- to five-digit numbers
followed by an m.

(?<=#)\d{2,5} – search for the number of copies:

matches two- to five-digit numbers
preceded by a #.

(?<=B)\d{1,2} – search for the Bleed value:

matches one- to two-digit numbers
preceded by a B.

Example 2

Structure of the file name: Filename_210x297mm_#200_P2_B5mm

The following information can be extracted from this file name:

"File name" describes the name of the file.
The file has a width of 210 mm.
The file has a height of 297 mm.
200 copies should be produced from the print file.
The file contains two pages.
The file should have a Bleed of 5 mm.

The following regular expressions can be defined to extract the required information:

^[^_]+ – search for the file name

matches the beginning of the string,
consists of one or more characters that are not an underscore (up to the first underscore) and
may contain one or more characters.

(?=_)\d{2,4} – search for the width of the Print Item:

matches two- to four-digit numbers
that follow an underscore _ .

(?=x)\d{2,5}(?=m) – search for the height of the Print Item:

matches two- to five-digit numbers
that follow an x , and
that procede an m.

(?=P)\d{1,4} – search for the number of pages of the Print Item:

matches one- two four-digit numbers
that follow a P.

(?=#)\d{2,5}(?=_) – search for the number of copies:

matches two- to five-digit numbers
that follow a # , and
that precede an underscore _ .

(?=B)\d{1,4}(?=m) – search for the Expected Bleed:

matches one- to four-digit numbers
that follow a B, and
that precede an m.

Example 3

Structure of the file name: 100100_210x297mm_#500_P5_B3mm_filename

The following information can be extracted from this file name:

The order number, which should be stored as the "External ID".
The file has a width of 210 mm.
The file has a height of 297 mm.
500 copies of the print file should be produced.
The file consists of five pages.
The file should have a Bleed of 3 mm.
The file name, which will be used as the unique identifier of the order line.

The following regular expressions can be defined to extract the required information:

^\d+ – search for the External ID:

matches any digit
consisting of one or more digits
appearing at the beginning of the string.

(?=_B)\d+ – search for the Expected Bleed:

matches one or more digits
may occur one or multiple times, and
appearing after a B.

(?=_#)\d+ – search for the number of copies:

matches any digits
which may occur once or multiple times
appearing after a #.

(?=_)\d+(?=x) – sear for the width of the Print Item:

matches any digits
which may occur once or multiple times
appearing after an underscore _ and
appearing before an x.

(?=x)\d+(?=mm) – search for the height of the Print Item:

matches any digits
which may occur once or multiple times,
appearing after an x and
appearing before mm.

[^_]+$ – search for the file name:

matches any number of characters except an underscore _ (i.e. starting from the last underscore), and
appearing at the end of the string.

Article Update: Workflow 1.21.1 – 09/2025

Previous Article Regular Expressions for Fixups

User Manual

Regular Expressions for extracting Metadata

1. Introduction

2. Search and Extract

Example 1

Example 2

Example 3

Additional Information

Other Resources

Regular Expressions for extracting Metadata

Heading anchor 1. Introduction

Heading anchor 2. Search and Extract

Heading anchor Example 1

Heading anchor Example 2

Heading anchor Example 3

Additional Information

Other Resources

1. Introduction

2. Search and Extract

Example 1

Example 2

Example 3