ProSEM In Action

Using Image Metadata in Formulas and Scripts

Summary:

Most SEMs store additional information about an SEM image in Image Metadata, sometimes embedded within the image file itself, in other cases, in separate "sidecar" files.  ProSEM reads image metadata when available, and makes that information accessible to users either as Formulas in the Variables panel, or within Scripts.

 

Seeing Image Metadata

Metadata associated with an image is visible in ProSEM's Image Information Panel. If this panel is not visible, it can be made so by selecting it in ProSEM's View menu.

This panel displays two sections of information about the image.

File Information

This section exists for all images, and contains information about the image file itself, including:

  • The file path
  • The pixel count in X and Y
  • The pixel size
  • Image Rotation applied within ProSEM
  • The image field of view

SEM Metadata

This section exists only if the image file has associated metadata, either embedded within the image, or in an adjunct sidecar file read by ProSEM. The contents depend entirely on the SEM software and settings, and varies by SEM manufacturer and model.  The Image Information panel shows the available metadata.  Metadata is stored in a dictionary-style data structure, with a name or a "key", and as associated value.  In many cases, the meaning of the information is obvious; in some cases, however, users may need to consult their SEM manual or applications support for proper interpretation of metadata items.

For example, for SEM images from various manufacturer's tools, the Working Distance might be found in various forms, including:

WorkingDistance : 0.00786781

$$SM_WD : 3.6

WD : 5.3 mm

The interpretation is generally clear, but obviously depends strongly on the tool's software.  Here, the first is expressed in meters, the second in millimeters, and the third has an explicit unit included.  Depending on how the data is to be used, it may be necessary to convert to different units, or remove the units so that a numerical comparison can be made.

Using Metadata

All Metadata values are reported as Strings, even when the data is numeric. For some uses, the string representation is fine, for example, if the task is just print the metadata value for labeling or identification. In other applications, it is necessary to process the metadata value in one or more ways:

  • Convert the data into a numeric type, for example for comparison, or bounds checking
  • Extract just a portion of the metadata string

These examples show some of the common methods of processing metadata to extract useful information about an image or measurement. The regular expression usage is given for both Formulas (JavaScript) and Scripts (Python).  The syntax of the Regular Expressions themselves is the same, but the methods for applying those regular expressions to find and extract some information from file or image metadata differs.

Description Metadata Content Desired Result

Formula (JavaScript)
---------------------------------------------------
Script (Python)

Notes
Extract numeric value from string, method 1, fixed position 5.32 mm
5.32
Number(image.metadata['WD'].substring(0,4)

---------------------------------------------------

float(image.metadata['WD'][0,4])

Simple, but least flexible method, depends on format of metadata string always have the exact same number of digits

Extract numeric value from string, method 2: using regular expression 5.32 mm 5.32
Number(/([\-]?[\d\.]+)/.exec(image.metadata['WD'])[1])

---------------------------------------------------

m = re.search(r'([\-]?[\d\.]+)',image.metadata['WD'])
float(m.group(1))
Using regex, finds sequence of digits, and optionally decimal point, and optionally a leading negative sign
Extract numeric stage position using regex X=40.4550, Y=11.8194, R=356.5767, Z=4.00, T=0.00 11.8194
Number(/Y=([\-]?[\d\.]+?),/.exec(image.metadata['STAGE_POS'])[1])

---------------------------------------------------

m = re.search(r'Y=([\-]?[\d\.]+?),', image.metadata['WD'])
float(group(1))
Using regex, finds sequence of digits, optional decimal or negative sign, immediately following string "Y="
Extract data from image filename Dose_140 140
Number(/_([\d\.]+)$/.exec(image.label)[1])

---------------------------------------------------

m = re.search(r'_([\d\.]+)$', image.label)
float(group(1))
Using regex, find digits following underscore character, up until the end of the image name
Extract data from image filename WFR_126A_DOS_0.70_DEN_025_25

0.70

Number(/_DOS_([\d\.]+)_/.exec(image.label)[1])

---------------------------------------------------

m = re.search(r'_DOS_([\d\.]+)_', image.label)
float(group(1))
Using regex, find digits following literal string "_DOS_"

 

In scripts, the regular expression library must be included in the Python script in the script header with:   

import re

 

 

Regular Expressions (regex): Really Brief Overview

Regular Expressions are a very common method for string processing, but can also be a bit obscure and complex.  There are many online resources for learning about Regular Expressions, but for ProSEM use, many tasks can be accomplished with just a very small subset of the capabilities, summarized here:

General Form for apply a regular expression to a string:

EXPR.exec(STRING)[1]

This applies the regular expression EXPR to the string STRING, and returns the first match result.

EXPR is a regular expression literal, and generally has the pattern to be matched, enclosed in slashes, for example:   /ProSEM/
will match the string "ProSEM" if contained in the character string STRING.  When working with SEM metadata, it is often useful to extract just portions of the the full metadata string.  For this one or more 'groups' are defined in the regex; groupings are enclosed by parentheses, and match content usually using a set of metacharacters within the parentheses. 

Grouping Example:

/ProSEM v(/d/./d/./d)/  when applied to the string:  "ProSEM v2.8.4" will return the grouped match:  "2.8.4".  If the portion of the expression not inside the grouping parentheses matches the literal 'ProSEM v', and then the portion inside the parentheses consists of a series of special metacharacters, each starting with a backslash '\'.  Here the pattern \d\.\d\.\d looks for a digit, then a period, then another digit, then another period and then a final digit. Note that this simple example will only match single digits, so if the version string were 2.10.1, this would not match and no value would be returned.  The expression can be modified to match one or more digits in each location by adding a quantifier character to the digit indicator, in this example the '+' quantifier after the '\d' digit character indicates to match one or more digits in a row,  so the expression:

EXPR=/ProSEM v(\d+\.\d+\.\d+)/

will match either "ProSEM v2.8.4" or "ProSEM v2.12.15"

This expression could also be modified to match the three individual portions of the version number individually, so that this expression:

EXPR=/ProSEM v(\d+)\.(\d+)\.(\d+)/

when matched against a version string of the correct form such as "ProSEM v2.8.4" will now return 3 distinct answers, for the above:

EXPR.exec(metadata_string)[1] will return 2, the first grouped match
EXPR.exec(metadata_string)[2] returns 8, the second grouping, and
EXPR.exec(metadata_string)[3] returns 4.

 

Regular Expressions (regex) Most Common Elements

Match Character(s) Meaning Notes
\s Character match white space
(spaces, tabs)

 

\d Character match a digit, 0-9

 

. Character match any character

 

^ Anchor to start of string

 

$ Anchor to end of string

 

* Quantifier match 0 or more
of the preceding match

 

+ Quantifier match 1 or more
of the preceding match

 

( ) Grouping, match and
capture enclosed pattern

For multiple matches, the results are available by indexing the full results, ie the first match is returned as index [1], second as index [2] and so on.

[abc] Range, a or b or c

[a-j] match a letter from a to j
[0-6]  match a digit from 0 to 6