LogoHoriz-Yellow-Rxf.bmp

 

           

RegexFormat 8  -  Unicode Special Edition

 

 

Version 8.5    Released 1-21-2018

Explore Unicode 10  with super controls.

 

This version is built with VS2015 and requires those  MFC / CRT  runtime libraries.

They are included in the setup as Merge Modules.

If you have  problems with installation, download and install the distributable  from Microsoft.

Alternatively, they are available from the distributable directory:

vc_redist.x64.exe   or   vc_redist.x86.exe

 

 

Available in 32 and 64 bit versions.

 

_________________________________________________________________________________

Quick Download

A zipped install of the latest version can be downloaded here:

32-bit : Version 8   and   64-bit : Version 8   __  or from the Download directory.

 

_________________________________________________________________________________

            Quick Links:

Download  or  Samples  directory,     v7  or  v6   history

 

_________________________________________________________________________________

   Important Note(s):

RegexFormat 8 uses crypto services from CryptoAPI.dll. This is usually  located in the

Windows\System32 directory. Please insure that it is installed.

 

_________________________________________________________________________________

Version History   ( Latest build:  8.9 - 2 )

 

 

 

Version 8.9 – 2        4/14/2018          The new Strings To RegexTernary  and  Mega-Convert  tools were put to use

                                                       creating Utf-16 and Utf-32 Emoji regex. These regex will match all the emoji strings

                                                       specified in the current V11 of Unicode. The regex are created from the emoji-ordering.txt

                                                       file obtained from the Unicode.org site.

 

                                                       The new samples can be obtained in the _Samples directory under Emoji sub-directory.

                                                       There is a starter file Machine-readable emoji ordering v11.0 containing instructions

                                                       to get these regex in a semi-automated way, with just a couple of key strokes.

 

                                                       When more emoji are added, just generate the new regexes.

                                                       For quick reference, this is a link to text version of that file:

                                                       Machine-readable emoji ordering v11.0.txt.

 

                                                       Screen shot :  Tern Tool

 

 

Version 8.9 – 2        4/14/2018          Upgraded:   the new Mega-Convert Tool to include two new options.                                                      

                                                       These options are for the conversion From Method and Syntax.

                                                       - Added an  Normalize Utf-16/32 Hex  method option to the from combo box.

                                                       This will run both the 16 /32 methods in one operation. It is equivalent to running

                                                       the selected syntax form as they exist in both modes.

                                                       - Added an  All Syntaxes  syntax option to the from syntax combo box.

                                                       This will run all syntaxes available for the selected from method in one operation.

 

                                                       When used together, it enables the From Conversion to become:

                                                       Normalize Utf-16/32 Hex” using “All Syntaxes”,  converting to a single To Conversion

                                                       form.  This is a potent combination.

                                                       Screen shot :  Mega-Convert                                                  

 

 

Version 8.9 – 0      4/9/2018            Release minor sub-version 8.9.

                                                      

                                                       Upgraded:   the regex engine to accept property,  Single-Name shortcuts for types :

                                                       Binary, General_Category, General_Category_Mask,  Script and Block.

                                                       For these types, the other way \p{Type=Value} is still supported.

                                                       This is a system-wide modification that is honored by all the sub-systems.

                                                       For the UCD-Interface Tool, a selectable flag ( checkbox ) is available to select to use the shortcut

                                                       regex names or not when adding properties to the cache.

                                                       This upgrade adds maximum flexibility to the use of properties in regex constructs.

 

                                                       Upgraded:   the new Mega-Convert Tool to include option for the printable,

                                                       non-control ascii range ( 0x21 – 0x7F ). There was already an extended ascii range option.

                                                       Screen shot :  Mega-Convert  

 

 

Version 8.8 – 0      4/6/2018            Release minor sub-version 8.8.

                                                      

                                                       Upgraded:   to the  Strings To Regex – Ternary Tree   tool.

                                                       Selectable UTF-16/32 processing. Many new options,

                                                       including an Analyze data feature that shows the UTF-16/32 metrics

                                                       along with giving recommendations of options to get the best outcome.

                                                       The upgraded tool also features deep aggressive factoring options.

                                                       Screen shot :  Tern Tool

 

                                                       Fixed:   There was a bug fix.  When using the Replace-All feature within

                                                       the Find/Replace paradigm and the ICU-mode (UTF-32) set, the target

                                                       mirror buffer was not having the flag set correctly.

                                                       This flag converts the target buffer to/from  u16string / u32string as needed.

                                                       This was an oversight when the whole system was converted over.

                                                       It now works correctly and without issues.

 

 

Version 8.6 – 0      3/27/2018          Release minor sub-version 8.6.

                                                      

                                                       Added a new tool:   Mega-Convert.

                                                       Converts between any Unicode/Hex/Codepoint Notation methods,

                                                       within any syntax and format. Truly a remarkable tool.

                                                       Operates on any of the available input edit box formats.

                                                       Screen shot :  Mega-Convert  

 

 

Version 8.5 –8         3/11/2018          Mega-String  modifications.

                                                       Parsing is extended to the four C-Style string options.

                                                       Includes optional stripping of  _T(“”) macro’s and parsing of the nine trigraph’s.

                                                       See the documentation for Mega-String Tool for an overview of functionality.

                                                       Previously, the C-Style string parsing was generalized in the Double Quote parse option.

                                                      

Version 8.5 –6         2/21/2018          Added class range combining to Num Range Generator tool.

 

Version 8.5 –5         2/19/2018          Added a new Benchmark Result item -   Matches Per Second.

                                                       This is a calculated average obtained by this formula:

                                                       Matches per iteration * Total iterations  / Total run time microseconds ( converted to seconds ).

                                                      

Version 8.5 –4         1/28/2018          Added a new format option under global expansion.

                                                       When global expansion is disabled, no expansion takes place.

                                                       The new option is to enable group syntax expansion (default) or disable.

                                                       The option only takes affect when global expansion is disabled.

                                                      

                                                       Note – The normal mode when expansion is enabled is to separate some

                                                       constructs with a space. These constructs are selectable at the bottom of the format

                                                       section in the Flags pane. Also, any embedded horizontal whitespace is not touched.

                                                       With the addition of this new option, time was taken to redesign the affect when

                                                       the global expansion flag is disabled. Now any air (whitespace between constructs)

                                                       is taken out.

                                                       Setting this global expansion off then back on, has the affect of taking the air out

                                                       then re-expanding constructs. Doing this does not affect the current comment formatting.

 

 

Version 8.5 – 0      1/21/2018          Release minor sub-version 8.5.

                                                      

                                                       Upgraded  RegexFormat  to  Unicode 10.0,  CLDR 32,  ICU4 – 60.2 UCD.

 

 

Version 8.4 – 2        1/11/2018          Added the current Regex engine type text to the Formatted Output tab label (MDI document),

                                                       which is currently selected in the flags pane.

                                                       When the engine format type changes, the engine  type is included in the formatted tab label.

                                                       Also, when changed, an arrow indicator is set in the State button text as a reminder

                                                       that the regex source needs to be reformatted. The reminder arrow disappears upon

                                                       subsequent formatting. This extends the properties to a more easily visible location.

 

 

Version 8.4 – 0      12/1/2017          Release minor sub-version 8.4.

                                                      

                                                       Summary:                                                       

                                                       A Cumulative update ( which includes the previous regex engine modifications ),

                                                       new changes and some bug fixes.

 

                                                       New:

                                                       - Within the Hex Reader  dialog.

                                                          Replace “CRLF” metrics and highlighting to encompass all Unicode line breaks.

                                                          Modified “Whitespace” metrics and highlighting to encompass all Unicode whitespace.

                                                       - A Save All Modified menu item added in the “File” menu. Includes a Yes to all button within the dialog.

                                                      

                                                       Fixed:

                                                       - Within the Format regex code, fixed a bug where some whitespace was not getting

                                                         escaped when in X-Mode (eXpanded).

                                                       - Within the Strings To Regex(Ternary Tree)  dialog, fixed a bug in the Simple Factoring algorithm.

 

 

Version 8.3 – 8        11/16/2017        Regex engine modifications:

                                                       - Allow Back References  to undefined groups (not yet parsed).

                                                       - Allow Nested Back References.

                                                       Note – these are significant changes to the boost regex engine, and these and the other mods

                                                       bring it up to par (and performance) with Perl’s regex engine.

 

Version 8.3 – 7        11/6/2017          Added another new option to the Strings to Regex Ternary Tree Tool:

                                                       Do group factoring. Screenshot:  Strings to Regex – Ternary Tree       

 

Version 8.3 – 6        10/27/2017        Extended  Leap Year Range to Regex  tool’s year range from 0 – 9999.

      

 

 

Version 8.3 – 0      10/4/2017          Release minor sub-version 8.3.

                                                       Completed the system wide conversion started in Version 8.2 – 18.

                                                       Changes apply to ICU  mode  (UTF-32) only !

                                                       The non-ICU mode regex operations remain unchanged.

                                                      

                                                       Summary:                                                       

                                                       Removed the facet overhead (UTF-16 to UTF-32) of searching a target string.

                                                       Now uses u32string iterators directly for regex search / replace  operations.

                                                       This includes using u32string when constructing regex,  meaning

                                                       surrogate pairs and stand alone surrogates are resolved to UTF-32 codepoints.

                                                       Results are correctly mapped / highlighted back to the wide string display’s.

                                                      

                                                       Affected code:  All places in the application that use ICU mode (UTF-32).

 

 

Version 8.2 – 25      9/28/2017          Upgraded the regex engine to version 1.65.1

                                                       All modifications are carried forward.

 

Version 8.2 – 23      9/25/2017          Fixed a minor bug where in certain circumstances, the floating close button

                                                       failed to display (when enabled) when mouse-over the mdi-tab.

                                                      

                                                       Added / modified features in the Layout->Document & MDI Tabs menu:

                                                       - Enable Active Tab Bold Font    ( default = false )

                                                       - Tab Border Width     ( 0 - 5 pixels,  default = 2 )

                                                       - Text Shading - Inactive View     ( None, %20 - %50,  default = %20 )

 

Version 8.2 – 21      9/22/2017          Introducing a new regex generate tool:    Leap Year Range to Regex

                                                       A truly accurate tool that lets you generate a custom Leap Year regex given a range of years.

                                                       Multiple compression levels are selectable to suite any project and performance preference.

                                                       This is the first installment of a Date/Time regex generation suite soon to be available.

                                                       Screen shots:     Ly1     Ly2     Ly3     Ly4     Ly5     Ly6     Ly7     Ly8     Ly9

 

Version 8.2 – 20      9/13/2017          Added a new option to the Strings to Regex Ternary Tree Tool:

                                                       Convert alternations  (?: x | y | z )  to class  [ x y z ]

 

Version 8.2 – 19      9/11/2017          Fixed an issue on the 32-bit version where using MemDC for virtual list control with more than

                                                       500,000 items significantly slowed performance.

                                                       The 64-bit version is unaffected.  These virtual lists are used to display Unicode names.

                                                      

 

Version 8.2 – 18   9/8/2017            General Modifications:  Removed the facet overhead (utf16 to utf32) of searching a target string

                                                       when  in  ICU  mode. Now uses UString32 iterators directly for regex search operations.

                                                      

                                                       Affected code: 

                                                       - Benchmark suite,  %100 speed increase in ICU  flagged regex.

                                                       - UCD Interface,  %100 speed increase in Custom Rx and CodePoints pages.

                                                      

                                                       Note that the UCD Interface pages now have the full Code Point range available for query.

                                                       This includes leading/trailing surrogates and non-characters as well.

 

 

Version 8.2 – 14      8/30/2017         Modified Benchmark suite – Added a custom control vertical bar with thumb indicating

                                                       current top slot. This is a subtle visual indicator when scrolling slots.

                                                      

                                                       UCD – Custom Rx page, expand the regex input box.

                                                       Fixed a minor startup issue on this page.

 

 

Version 8.2 – 11      8/22/2017          Regex engine modification:  Corrected  Non-word boundary construct \B.

                                                       Previously, it did not correctly match at the beginning or end of string if the adjacent

                                                       character were a non-word.

 

                                                       Modified the Match Results title to display the regex options used to obtain the last match.

                                                       This is an important visual aid to help quickly diagnose possible wrong, invalid or non-matches.

 

                                                       Expanded the Benchmark suite to eight slots available per run.

                                                       The suite has been renamed to Mega-Bench 8 to reflect the increase in slots.

                                                       Screenshots:   Bench1   Bench2   Bench3   Bench Report Generator 

 

 

Version 8.2 – 6        7/27/2017          Modified Benchmark suite to update an items run display result immediately when

                                                       it’s run finishes. Previously, item display results were updated upon completion of the last run.

                                                       In the next update we will be adding more item slots (currently there are 2 available for runs).

 

Version 8.2 – 5        6/21/2017          Added a  Mark Location  debug option to the Mega-String control.

                                                       This option is only enabled for the Parsing function. It adds  = text = marks at

                                                       the location where start and end string quote delimiter’s were parsed and removed.

                                                       This option helps diagnose errant string quoting.

 

                                                       Additionally, if the Un-escape delimiters box is checked, it adds a where the opening or closing

                                                       delimiter was removed,  or a indicating no delimiter was found, but should be at this location.

                                                       Note that un-escaping escaped delimiters does not involve marking.

                                                       This option helps diagnose errant delimited regex.

                                                       Marking is available for parsing functions: Single, Double, and No Quoting.

 

                                                       Screenshot:     Mega-String : Mark Location 

 

Version 8.2 – 2        5/23/2017          Added Python’s Raw String syntax generation to the  Mega-String control.

                                                       Options include double r”  “ or single r’  quote constructs, as well as optional intelligent

                                                       padding already built into the Mega-String control. Optional lines continued + “\n for multi-line.

                                                       Safeguards odd number of escapes anywhere in target as well at the end of the string,

                                                       and provide proper escaping of delimiters.

                                                       Screenshot:     Mega-String : Python Raw Strings 

 

Version 8.2 – 1        5/14/2017          Added Regex Replace Format String Syntax to include Perl, Sed, Literal, and Boost-Extended.

                                                       Formerly, by default, the Perl format string was used in replacements with no other options.

                                                       This can be set within the Macro Manager dialog just above the replace edit box.

 

 (n/a)                       5/4/2017           Updated IIS7 web.config to allow .rxf mime type sample files to be downloaded.

                                                       These sample files can now be downloaded from the Download directory.

 

Version 8.2 – 0      4/24/2017          Release minor sub-version 8.2.

                                                       Updated to Regex engine 1.64. All modifications are carried forward.

 

Version 8.1 – 1        4/19/2017          Regex engine modification to fix a bug in class intersection.

                                                       Update to this if version 8.1-0 was installed.

 

Version 8.1 – 0      4/12/2017          Release minor sub-version 8.1.

                                                       Regex engine modifications to correctly handle class intersection.

                                                       Example [^\W\D] matches only digits.

 

Version 8.0 – 14      4/1/2017            Modified UCD Property Search to trim whitespace and added an automatic tokenize feature.

                                                       If the initial string is not found, the tokenized parts will be searched for instead.

                                                       The token delimiters can consist of any of these characters   <space> _ - , . ' * " ; \t

 

Version 8.0 – 13      3/23/2017          Fixed a Benchmark issue when advancing position on a zero-length match.

                                                       In a rare case, this resulted in incorrectly reporting the number of matches on a run.

 

Version 8.0 – 9        3/14/2017          Added a   Unique    page to the UCD Interface dialog.

                                                       This has the same functionality as the Codepoints and Custom-Rx pages, except the regex

                                                       object is removed.  It is instead replaced by an input edit box to paste or type any string.

                                                       The string is analyzed for unique codepoints which are displayed in the result.

                                                       The result can then be processed using the same features as in the Codepoints and Custom-Rx pages.

 

Version 8.0 – 8        3/10/2017          Added a   Custom-Rx    page to the UCD Interface dialog.

                                                       This has the same functionality as the Codepoints page, except the regex

                                                       object is editable.  Thus, any regex construct can be used to obtain a codepoint set.

                                                       Properties from the UCD regex cache can be easily added, mixed, and matched within the regex.

 

Version 8.0 – 6        2/21/2017          Some UCD navigation improvements and prevent tab control from getting focus.

 

Version 8.0 – 5        2/20/2017          Post-release:  Fixed an issue that caused a crash

                                                       when trying to drag dockable panes after accessing the UCD names page.

                                                       If using a versions between 8.0.0 - 8.0.4 it is recommended that it be

                                                       upgraded to  version 8.0.5.

 

 

_________________________________________________________________________________

_________________________________________________________________________________

 

New Unicode features:

 

A few ‘Super Controls’ are new - UCD (Unicode Character Database) Interface

using ICU4 58.2. Overhaul of regex engine with full Unicode 9 support, Properties

(over 1200) and Names (0x10FFFF). Includes all scripts and script extensions.

 

UCD Info Page :   UCD Interface Usage

 

UCD Tab Screenshots :   Usage    Properties    Codepoints    Names    Unique    Custom-Rx

 

New viewer available from all editors :   Uni-Name Viewer

 

 ___________________________________________________

 

Included features:

 

This application parses,  dynamically formats/expands/compresses Regular Expressions.

Includes a built-in testing regex engine derived and modified from Boost Regex 1.64.

Includes a regex benchmarking suite.

Uses and includes the  ICU4 58.2 Library.

Includes UCD (Unicode Character Database) Interface a ‘Super Control’ suite.

Many new controls, including a Unicode Name Viewer to go with the existing Hex Viewer.

View anything from anywhere, it’s integrated into all editors.

 

See Online Manual (Deprecated)

 

The core:

 

It’s many strong features include formatting, expanding, compressing expressions,

advanced comment handling, auto-generated capture group comments, analysis

tools, padding, Raw/Single/Double quoted String construction of finished expressions

that can be pasted into development code.

 

Includes independent property views of the current regular expression providing a quick

look at its state and comprehensive construct metrics and error analysis information.

Errors can be selected in different views. For example, when an error is selected from

the view list, it is instantly selected in both the input and output views, when selected

from the output, it is selected from the input and error list, etc.., - this makes

debugging quite easy.

 

Also included is a selectable, completely customizable analysis overlay of  conditional’s

and capture group counting (including named groups last), as well as annotated error

reporting of the entire expression embedded in the formatted output.

Formatting continues to the end of the expression regardless of errors, thus providing

a single pass, down stream look after possibly trivial errors.

 

A Flags pane is provided to easily turn on/off options and settings.

Over 400 internal flag bits control the parsing/formatting engine giving maximum

flexibility to precisely control how the expression is parsed, how it is expanded or

compressed, and the look and shape of the formatted output.

Its solid parsing foundation include most all individual constructs available in

Regular Expressions are provided for and are individually selectable. There are built-in

presets for the major flavors, but everything can be customized, giving the ability to

define custom language presets.

 

Included Presets:

·         User-Defined

·         Default

·         Custom

·         Perl

·         PCRE

·         Dot-Net

·         Java 6

·         Java 7

·         JavaScript

 

Expression with embedded ‘expanded’ or ‘compressed’ modes are handled seamlessly

by the engine.

 

Easily unveil the most complex packed expressions in existence with the click of a button.

Debug, refactor, make changes, then pack it back up for production.

Save the document (.rxf) with all of its views and Flags state, open it later when the

time comes for modification or maintenance or for quick recollection.

 

Whether a novice or expert, if you use Regular Expressions, this application will save

you hours of work.  See it, change it, and maintain it as real code.

 

 

Supported Platforms:

Windows XP, Vista, 7, 8, 10

 

Download RegexFormat

A zipped install of the latest version can be downloaded here

->       32-bit : Version 8   and   64-bit : Version 8

 

Manual/Help File:

(Deprecated)

Version 4.2 manual is included in the installation (or available online – see above link),

but can also be downloaded here ->  Manual/Help File

 

Installation: 

Unzip the files to a temporary directory then run the  Setup.exe  program.

The installed  Samples  directory contains data files with which to evaluate the application.

Miscellaneous samples can be obtained and are added to the Samples directory.

 

 

To Purchase:

Single and Multi-Site License(s) are offered and are now available for purchase.

Accepted payment methods include Major Credit Card or PayPal account.

Questions can be directed to support@regexformat.com

 

Choose a RegexFormat license purchase option:

 

Ø  Single License -   Price  $29 (USD)

 

 

 

Ø  MULT-Site License -   Price  $25 (USD) / ea. , quantity 2-100

(Requires an organization name/address)

 

                                          

 

A  registration key will be emailed to you after the purchase process completes.

 

________________________________________________________

 

RDNC Software

RegexFormat – Copyright  ©  2013 – 2018  RDNC Software

________________________________________________________