User Manual

Regular Expressions for extracting Metadata

Updated on

In this article, you will learn

  • which Regular Expressions are useful for extracting information from file names, and
  • how these should be structured for different examples.

1. Introduction

Regular Expressions can be used to extract metadata directly from the file name. If the file name contains recurring information, such as dimensions, the number of copies, or the Bleed, these values can be automatically recognized and transferred to the metadata of the Print Item. The extracted information is then available at the corresponding locations in the application for further processing.

2. Search and Extract

As with most tasks, there are many possible approaches when working with regular expressions. The following examples illustrate how to define regular expressions for extracting metadata.

Example 1

Structure of the file name: Craft Beer Magenta 400x1000mm_#2000_B2

The following information can be extracted from this file name:

  • The file has a width of 400 mm.
  • The file has a height of 1000 mm.
  • 2000 copies of the print file are to be produced.
  • The file must include a Bleed of 2 mm.

The following regular expressions can be defined to extract the correct information:

\d{2,3}(?=x) – search for the width of the Print Item:

  • matches two- to three-digit numbers
  • followed by an x.

\d{2,5}(?=m) – search for the height of the Print Item:

  • matches two- to five-digit numbers
  • followed by an m.

(?<=#)\d{2,5} – search for the number of copies:

  • matches two- to five-digit numbers
  • preceded by a #.

(?<=B)\d{1,2} – search for the Bleed value:

  • matches one- to two-digit numbers
  • preceded by a B.

Example 2

Structure of the file name: Filename_210x297mm_#200_P2_B5mm

The following information can be extracted from this file name:

  • "File name" describes the name of the file.
  • The file has a width of 210 mm.
  • The file has a height of 297 mm.
  • 200 copies should be produced from the print file.
  • The file contains two pages.
  • The file should have a Bleed of 5 mm.

The following regular expressions can be defined to extract the required information:

^[^_]+ – search for the file name

  • matches the beginning of the string,
  • consists of one or more characters that are not an underscore (up to the first underscore) and
  • may contain one or more characters.

(?=_)\d{2,4} – search for the width of the Print Item:

  • matches two- to four-digit numbers
  • that follow an underscore _ .

(?=x)\d{2,5}(?=m) – search for the height of the Print Item:

  • matches two- to five-digit numbers
  • that follow an x , and
  • that procede an m.

(?=P)\d{1,4} – search for the number of pages of the Print Item:

  • matches one- two four-digit numbers
  • that follow a P.

(?=#)\d{2,5}(?=_) – search for the number of copies:

  • matches two- to five-digit numbers
  • that follow a # , and
  • that precede an underscore _ .

(?=B)\d{1,4}(?=m) – search for the Expected Bleed:

  • matches one- to four-digit numbers
  • that follow a B, and
  • that precede an m.

Example 3

Structure of the file name: 100100_210x297mm_#500_P5_B3mm_filename

The following information can be extracted from this file name:

  • The order number, which should be stored as the "External ID".
  • The file has a width of 210 mm.
  • The file has a height of 297 mm.
  • 500 copies of the print file should be produced.
  • The file consists of five pages.
  • The file should have a Bleed of 3 mm.
  • The file name, which will be used as the unique identifier of the order line.

The following regular expressions can be defined to extract the required information:

^\d+ – search for the External ID:

  • matches any digit
  • consisting of one or more digits
  • appearing at the beginning of the string.

(?=_B)\d+ – search for the Expected Bleed:

  • matches one or more digits
  • may occur one or multiple times, and
  • appearing after a B.

(?=_#)\d+ – search for the number of copies:

  • matches any digits
  • which may occur once or multiple times
  • appearing after a #.

(?=_)\d+(?=x) – sear for the width of the Print Item:

  • matches any digits
  • which may occur once or multiple times
  • appearing after an underscore _ and
  • appearing before an x.

(?=x)\d+(?=mm) – search for the height of the Print Item:

  • matches any digits
  • which may occur once or multiple times,
  • appearing after an x and
  • appearing before mm.

[^_]+$ – search for the file name:

  • matches any number of characters except an underscore _ (i.e. starting from the last underscore), and
  • appearing at the end of the string.

Article Update: Workflow 1.21.1 – 09/2025

Previous Article Regular Expressions for Fixups