RegexFormat 8 - Unicode Special Edition
Major Version 8 Released 2-8-2017
Explore Unicode 9 with super controls.
This version is built with VS2015 and requires those MFC / CRT runtime libraries.
They are included in the setup as Merge Modules.
If you have problems with installation, download and install the distributable from Microsoft.
Alternatively, they are available from the distributable directory:
Available in 32 and 64 bit versions.
A zipped install of the latest version can be downloaded here:
RegexFormat 8 uses crypto services from CryptoAPI.dll. This is usually located in the
Windows\System32 directory. Please insure that it is installed.
Version History ( Latest build: 8.4 - 2 )
Version 8.4 – 2 1/11/2018 Added the current Regex engine type text to the Formatted Output tab label (MDI document),
which is currently selected in the flags pane.
When the engine format type changes, the engine type is included in the formatted tab label.
Also, when changed, an arrow indicator is set in the State button text as a reminder
that the regex source needs to be reformatted. The reminder arrow disappears upon
subsequent formatting. This extends the properties to a more easily visible location.
Version 8.4 – 0 12/1/2017 Release minor sub-version 8.4.
A Cumulative update ( which includes the previous regex engine modifications ),
new changes and some bug fixes.
- Within the Hex Reader dialog.
Replace “CRLF” metrics and highlighting to encompass all Unicode line breaks.
Modified “Whitespace” metrics and highlighting to encompass all Unicode whitespace.
- A Save All Modified menu item added in the “File” menu. Includes a Yes to all button within the dialog.
- Within the Format regex code, fixed a bug where some whitespace was not getting
escaped when in X-Mode (eXpanded).
- Within the Strings To Regex(Ternary Tree) dialog, fixed a bug in the Simple Factoring algorithm.
Version 8.3 – 8 11/16/2017 Regex engine modifications:
- Allow Back References to undefined groups (not yet parsed).
- Allow Nested Back References.
Note – these are significant changes to the boost regex engine, and these and the other mods
bring it up to par (and performance) with Perl’s regex engine.
Version 8.3 – 7 11/6/2017 Added another new option to the Strings to Regex Ternary Tree Tool:
Do group factoring. Screenshot: Strings to Regex – Ternary Tree
Version 8.3 – 6 10/27/2017 Extended Leap Year Range to Regex tool’s year range from 0 – 9999.
Version 8.3 – 0 10/4/2017 Release minor sub-version 8.3.
Completed the system wide conversion started in Version 8.2 – 18.
Changes apply to ICU mode (UTF-32) only !
The non-ICU mode regex operations remain unchanged.
Removed the facet overhead (UTF-16 to UTF-32) of searching a target string.
Now uses u32string iterators directly for regex search / replace operations.
This includes using u32string when constructing regex, meaning
surrogate pairs and stand alone surrogates are resolved to UTF-32 codepoints.
Results are correctly mapped / highlighted back to the wide string display’s.
Affected code: All places in the application that use ICU mode (UTF-32).
Version 8.2 – 25 9/28/2017 Upgraded the regex engine to version 1.65.1
All modifications are carried forward.
Version 8.2 – 23 9/25/2017 Fixed a minor bug where in certain circumstances, the floating close button
failed to display (when enabled) when mouse-over the mdi-tab.
Added / modified features in the Layout->Document & MDI Tabs menu:
- Enable Active Tab Bold Font ( default = false )
- Tab Border Width ( 0 - 5 pixels, default = 2 )
- Text Shading - Inactive View ( None, %20 - %50, default = %20 )
Version 8.2 – 21 9/22/2017 Introducing a new regex generate tool: Leap Year Range to Regex
A truly accurate tool that lets you generate a custom Leap Year regex given a range of years.
Multiple compression levels are selectable to suite any project and performance preference.
This is the first installment of a Date/Time regex generation suite soon to be available.
Version 8.2 – 20 9/13/2017 Added a new option to the Strings to Regex Ternary Tree Tool:
Convert alternations (?: x | y | z ) to class [ x y z ]
Version 8.2 – 19 9/11/2017 Fixed an issue on the 32-bit version where using MemDC for virtual list control with more than
500,000 items significantly slowed performance.
The 64-bit version is unaffected. These virtual lists are used to display Unicode names.
Version 8.2 – 18 9/8/2017 General Modifications: Removed the facet overhead (utf16 to utf32) of searching a target string
when in ICU mode. Now uses UString32 iterators directly for regex search operations.
- Benchmark suite, %100 speed increase in ICU flagged regex.
- UCD Interface, %100 speed increase in Custom Rx and CodePoints pages.
Note that the UCD Interface pages now have the full Code Point range available for query.
This includes leading/trailing surrogates and non-characters as well.
Version 8.2 – 14 8/30/2017 Modified Benchmark suite – Added a custom control vertical bar with thumb indicating
current top slot. This is a subtle visual indicator when scrolling slots.
UCD – Custom Rx page, expand the regex input box.
Fixed a minor startup issue on this page.
Version 8.2 – 11 8/22/2017 Regex engine modification: Corrected Non-word boundary construct \B.
Previously, it did not correctly match at the beginning or end of string if the adjacent
character were a non-word.
Modified the Match Results title to display the regex options used to obtain the last match.
This is an important visual aid to help quickly diagnose possible wrong, invalid or non-matches.
Expanded the Benchmark suite to eight slots available per run.
The suite has been renamed to Mega-Bench 8 to reflect the increase in slots.
Version 8.2 – 6 7/27/2017 Modified Benchmark suite to update an items run display result immediately when
it’s run finishes. Previously, item display results were updated upon completion of the last run.
In the next update we will be adding more item slots (currently there are 2 available for runs).
Version 8.2 – 5 6/21/2017 Added a Mark Location debug option to the Mega-String control.
This option is only enabled for the Parsing function. It adds =► text ◄= marks at
the location where start and end string quote delimiter’s were parsed and removed.
This option helps diagnose errant string quoting.
Additionally, if the Un-escape delimiters box is checked, it adds a ◆ where the opening or closing
delimiter was removed, or a ◇ indicating no delimiter was found, but should be at this location.
Note that un-escaping escaped delimiters does not involve marking.
This option helps diagnose errant delimited regex.
Marking is available for parsing functions: Single, Double, and No Quoting.
Screenshot: Mega-String : Mark Location
Version 8.2 – 2 5/23/2017 Added Python’s Raw String syntax generation to the Mega-String control.
Options include double r” “ or single r’ ‘ quote constructs, as well as optional intelligent
padding already built into the Mega-String control. Optional lines continued + “\n” for multi-line.
Safeguards odd number of escapes anywhere in target as well at the end of the string,
and provide proper escaping of delimiters.
Screenshot: Mega-String : Python Raw Strings
Version 8.2 – 1 5/14/2017 Added Regex Replace Format String Syntax to include Perl, Sed, Literal, and Boost-Extended.
Formerly, by default, the Perl format string was used in replacements with no other options.
This can be set within the Macro Manager dialog just above the replace edit box.
(n/a) 5/4/2017 Updated IIS7 web.config to allow .rxf mime type sample files to be downloaded.
These sample files can now be downloaded from the Download directory.
Version 8.2 – 0 4/24/2017 Release minor sub-version 8.2.
Updated to Regex engine 1.64. All modifications are carried forward.
Version 8.1 – 1 4/19/2017 Regex engine modification to fix a bug in class intersection.
Update to this if version 8.1-0 was installed.
Version 8.1 – 0 4/12/2017 Release minor sub-version 8.1.
Regex engine modifications to correctly handle class intersection.
Example [^\W\D] matches only digits.
Version 8.0 – 14 4/1/2017 Modified UCD Property Search to trim whitespace and added an automatic tokenize feature.
If the initial string is not found, the tokenized parts will be searched for instead.
The token delimiters can consist of any of these characters <space> _ - , . ' * " ; \t
Version 8.0 – 13 3/23/2017 Fixed a Benchmark issue when advancing position on a zero-length match.
In a rare case, this resulted in incorrectly reporting the number of matches on a run.
Version 8.0 – 9 3/14/2017 Added a Unique page to the UCD Interface dialog.
This has the same functionality as the Codepoints and Custom-Rx pages, except the regex
object is removed. It is instead replaced by an input edit box to paste or type any string.
The string is analyzed for unique codepoints which are displayed in the result.
The result can then be processed using the same features as in the Codepoints and Custom-Rx pages.
Version 8.0 – 8 3/10/2017 Added a Custom-Rx page to the UCD Interface dialog.
This has the same functionality as the Codepoints page, except the regex
object is editable. Thus, any regex construct can be used to obtain a codepoint set.
Properties from the UCD regex cache can be easily added, mixed, and matched within the regex.
Version 8.0 – 6 2/21/2017 Some UCD navigation improvements and prevent tab control from getting focus.
Version 8.0 – 5 2/20/2017 Post-release: Fixed an issue that caused a crash
when trying to drag dockable panes after accessing the UCD names page.
If using a versions between 8.0.0 - 8.0.4 it is recommended that it be
upgraded to version 8.0.5.
New Unicode features:
A few ‘Super Controls’ are new - UCD (Unicode Character Database) Interface
using ICU4 58.2. Overhaul of regex engine with full Unicode 9 support, Properties
(over 1200) and Names (0x10FFFF). Includes all scripts and script extensions.
UCD Info Page : UCD Interface Usage
New viewer available from all editors : Uni-Name Viewer
This application parses, dynamically formats/expands/compresses Regular Expressions.
Includes a built-in testing regex engine derived and modified from Boost Regex 1.64.
Includes a regex benchmarking suite.
Uses and includes the ICU4 58.2 Library.
Includes UCD (Unicode Character Database) Interface a ‘Super Control’ suite.
Many new controls, including a Unicode Name Viewer to go with the existing Hex Viewer.
View anything from anywhere, it’s integrated into all editors.
See Online Manual (Deprecated)
It’s many strong features include formatting, expanding, compressing expressions,
advanced comment handling, auto-generated capture group comments, analysis
tools, padding, Raw/Single/Double quoted String construction of finished expressions
that can be pasted into development code.
Includes independent property views of the current regular expression providing a quick
look at its state and comprehensive construct metrics and error analysis information.
Errors can be selected in different views. For example, when an error is selected from
the view list, it is instantly selected in both the input and output views, when selected
from the output, it is selected from the input and error list, etc.., - this makes
debugging quite easy.
Also included is a selectable, completely customizable analysis overlay of conditional’s
and capture group counting (including named groups last), as well as annotated error
reporting of the entire expression embedded in the formatted output.
Formatting continues to the end of the expression regardless of errors, thus providing
a single pass, down stream look after possibly trivial errors.
A Flags pane is provided to easily turn on/off options and settings.
Over 400 internal flag bits control the parsing/formatting engine giving maximum
flexibility to precisely control how the expression is parsed, how it is expanded or
compressed, and the look and shape of the formatted output.
Its solid parsing foundation include most all individual constructs available in
Regular Expressions are provided for and are individually selectable. There are built-in
presets for the major flavors, but everything can be customized, giving the ability to
define custom language presets.
· Java 6
· Java 7
Expression with embedded ‘expanded’ or ‘compressed’ modes are handled seamlessly
by the engine.
Easily unveil the most complex packed expressions in existence with the click of a button.
Debug, refactor, make changes, then pack it back up for production.
Save the document (.rxf) with all of its views and Flags state, open it later when the
time comes for modification or maintenance or for quick recollection.
Whether a novice or expert, if you use Regular Expressions, this application will save
you hours of work. See it, change it, and maintain it as real code.
Windows XP, Vista, 7, 8, 10
A zipped install of the latest version can be downloaded here
Version 4.2 manual is included in the installation (or available online – see above link),
but can also be downloaded here -> Manual/Help File
Unzip the files to a temporary directory then run the Setup.exe program.
The installed Samples directory contains data files with which to evaluate the application.
Miscellaneous samples can be obtained and are added to the Samples directory.
Single and Multi-Site License(s) are offered and are now available for purchase.
Accepted payment methods include Major Credit Card or PayPal account.
Questions can be directed to email@example.com
Choose a RegexFormat license purchase option:
Ø Single License - Price $29 (USD)
Ø MULT-Site License - Price $25 (USD) / ea. , quantity 2-100
(Requires an organization name/address)
A registration key will be emailed to you after the purchase process completes.
RegexFormat – Copyright © 2013 – 2018 RDNC Software