cmark

0.29.0

CommonMark parsing and rendering library and program in C
commonmark/cmark

What's New

cmark 0.29.0

2019-04-08T17:31:59Z

[0.29.0]

  • Update spec to 0.29.
  • Make rendering safe by default (#239, #273). Adds CMARK_OPT_UNSAFE and make CMARK_OPT_SAFE a no-op (for API compatibility). The new default behavior is to suppress raw HTML and potentially dangerous links. The CMARK_OPT_UNSAFE option has to be set explicitly to prevent this. NOTE: This change will require modifications in bindings for cmark and in most libraries and programs that use cmark. Borrows heavily from @kivikakk's patch in github#123.
  • Add sourcepos info for inlines (Yuki Izumi).
  • Disallow more than 32 nested balanced parens in a link (Yuki Izumi).
  • Resolve link references before creating setext header. A setext header line after a link reference should not create a header, according to the spec.
  • commonmark renderer: improve escaping. URL-escape special characters when escape mode is URL, and not otherwise. Entity-escape control characters (< 0x20) in non-literal escape modes.
  • render: only emit actual newline when escape mode is LITERAL. For markdown content, e.g., in other contexts we want some kind of escaping, not a literal newline.
  • Update code span normalization to conform with spec change.
  • Allow empty <> link destination in reference link.
  • Remove leftover includes of memory.h (#290).
  • A link destination can't start with < unless it is an angle-bracket link that also ends with > (#289). (If your URL really starts with <, URL-escape it.)
  • Allow internal delimiter runs to match if both have lengths that are multiples of 3. See commonmark/commonmark#528.
  • Include references.h in parser.h (#287).
  • Fix [link](<foo\>).
  • Use hand-rolled scanner for thematic break (see #284). Keep track of the last position where a thematic break failed to match on a line, to avoid rescanning unnecessarily.
  • Rename ends_with_blank_line with S_ prefix.
  • Add CMARK_NODE__LAST_LINE_CHECKED flag (#284). Use this to avoid unnecessary recursion in ends_with_blank_line.
  • In ends_with_blank_line, call S_set_last_line_blank to avoid unnecessary repetition (#284). Once we settle whether a list item ends in a blank line, we don't need to revisit this in considering parent list items.
  • Disallow unescaped ( in parenthesized link title.
  • Copy line/col info straight from opener/closer (Ashe Connor). We can't rely on anything in subj since it's been modified while parsing the subject and could represent line info from a future line. This is simple and works.
  • render.c: reset last_breakable after cr. Fixes jgm/pandoc#5033.
  • Fix a typo in houdini_href_e.c (Felix Yan).
  • commonmark writer: use ~~~ fences if info string contains backtick. This is needed for round-trip tests.
  • Update scanners for new info string rules.
  • Add XSLT stylesheet to convert cmark XML back to Commonmark (Nick Wellnhofer, #264). Initial version of an XSLT stylesheet that converts the XML format produced by cmark -t xml back to Commonmark.
  • Check for whitespace before reference title (#263).
  • Bump CMake to version 3 (Jonathan Müller).
  • Build: Remove deprecated call to add_compiler_export_flags() (Jonathan Müller). It is deprecated in CMake 3.0, the replacement is to set the CXX_VISIBILITY_PRESET (or in our case C_VISIBILITY_PRESET) and VISIBILITY_INLINES_HIDDEN properties of the target. We're already setting them by setting the CMake variables anyway, so the call can be removed.
  • Build: only attempt to install MSVC system libraries on Windows (Saleem Abdulrasool). Newer versions of CMake attempt to query the system for information about the VS 2017 installation. Unfortunately, this query fails on non-Windows systems when cross-compiling: cmake_host_system_information does not recognize <key> VS_15_DIR. CMake will not find these system libraries on non-Windows hosts anyways, and we were silencing the warnings, so simply omit the installation when cross-compiling to Windows.
  • Simplify code normalization, in line with spec change.
  • Implement code span spec changes. These affect both parsing and writing commonmark.
  • Add link parsing corner cases to regressions (Ashe Connor).
  • Add xml:space="preserve" in XML output when appropriate (Nguyễn Thái Ngọc Duy). (For text, code, code_block, html_inline and html_block tags.)
  • Removed meta from list of block tags. Added regression test. See commonmark/commonmark-spec#527.
  • entity_tests.py - omit noisy success output.
  • pathological_tests.py: make tests run faster. Commented out the (already ignored) "many references" test, which times out. Reduced the iterations for a couple other tests.
  • pathological_tests.py: added test for deeply nested lists.
  • Optimize S_find_first_nonspace. We were needlessly redoing things we'd already done. Now we skip the work if the first nonspace is greater than the current offset. This fixes pathological slowdown with deeply nested lists (#255). For N = 3000, the time goes from over 17s to about 0.7s. Thanks to Martin Mitas for diagnosing the problem.
  • Allow spaces in link destination delimited with pointy brackets.
  • Adjust max length of decimal/numeric entities. See commonmark/commonmark-spec#487.
  • Fix inline raw HTML parsing. This fixes a recently added failing spec test case. Previously spaces were being allowed in unquoted attribute values; no we forbid them.
  • Don't allow list markers to be indented >= 4 spaces. See commonmark/commonmark-spec#497.
  • Check for empty buffer when rendering (Phil Turnbull). For empty documents, ->size is zero so renderer.buffer->ptr[renderer.buffer->size - 1] will cause an out-of-bounds read. Empty buffers always point to the global cmark_strbuf__initbuf buffer so we read cmark_strbuf__initbuf[-1].
  • Also run API tests with CMARK_SHARED=OFF (Nick Wellnhofer).
  • Rename roundtrip and entity tests (Nick Wellnhofer). Rename the tests to reflect that they use the library, not the executable.
  • Generate export header for static-only build (#247, Nick Wellnhofer).
  • Fuzz width parameter too (Phil Turnbull). Allow the width parameter to be generated too so we get better fuzz-coverage.
  • Don't discard empty fuzz test-cases (Phil Turnbull). We currently discard fuzz test-cases that are empty but empty inputs are valid markdown. This improves the fuzzing coverage slightly.
  • Fixed exit code for pathological tests.
  • Add allowed failures to pathological_tests.py. This allows us to include tests that we don't yet know how to pass.
  • Add timeout to pathological_tests.py. Tests must complete in 8 seconds or are errors.
  • Add more pathological tests (Martin Mitas). These tests target the issues #214, #218, #220.
  • Use pledge(2) on OpenBSD (Ashe Connor).
  • Update the Racket wrapper (Eli Barzilay).
  • Makefile: For afl target, don't build tests.

cmark

Build Status Windows Build Status

cmark is the C reference implementation of CommonMark, a rationalized version of Markdown syntax with a spec. (For the JavaScript reference implementation, see commonmark.js.)

It provides a shared library (libcmark) with functions for parsing CommonMark documents to an abstract syntax tree (AST), manipulating the AST, and rendering the document to HTML, groff man, LaTeX, CommonMark, or an XML representation of the AST. It also provides a command-line program (cmark) for parsing and rendering CommonMark documents.

Advantages of this library:

  • Portable. The library and program are written in standard C99 and have no external dependencies. They have been tested with MSVC, gcc, tcc, and clang.

  • Fast. cmark can render a Markdown version of War and Peace in the blink of an eye (127 milliseconds on a ten year old laptop, vs. 100-400 milliseconds for an eye blink). In our benchmarks, cmark is 10,000 times faster than the original Markdown.pl, and on par with the very fastest available Markdown processors.

  • Accurate. The library passes all CommonMark conformance tests.

  • Standardized. The library can be expected to parse CommonMark the same way as any other conforming parser. So, for example, you can use commonmark.js on the client to preview content that will be rendered on the server using cmark.

  • Robust. The library has been extensively fuzz-tested using american fuzzy lop. The test suite includes pathological cases that bring many other Markdown parsers to a crawl (for example, thousands-deep nested bracketed text or block quotes).

  • Flexible. CommonMark input is parsed to an AST which can be manipulated programmatically prior to rendering.

  • Multiple renderers. Output in HTML, groff man, LaTeX, CommonMark, and a custom XML format is supported. And it is easy to write new renderers to support other formats.

  • Free. BSD2-licensed.

It is easy to use libcmark in python, lua, ruby, and other dynamic languages: see the wrappers/ subdirectory for some simple examples.

There are also libraries that wrap libcmark for Go, Haskell, Ruby, Lua, Perl, Python, R and Scala.

Installing

Building the C program (cmark) and shared library (libcmark) requires cmake. If you modify scanners.re, then you will also need re2c (>= 0.14.2), which is used to generate scanners.c from scanners.re. We have included a pre-generated scanners.c in the repository to reduce build dependencies.

If you have GNU make, you can simply make, make test, and make install. This calls cmake to create a Makefile in the build directory, then uses that Makefile to create the executable and library. The binaries can be found in build/src. The default installation prefix is /usr/local. To change the installation prefix, pass the INSTALL_PREFIX variable if you run make for the first time: make INSTALL_PREFIX=path.

For a more portable method, you can use cmake manually. cmake knows how to create build environments for many build systems. For example, on FreeBSD:

mkdir build
cd build
cmake ..  # optionally: -DCMAKE_INSTALL_PREFIX=path
make      # executable will be created as build/src/cmark
make test
make install

Or, to create Xcode project files on OSX:

mkdir build
cd build
cmake -G Xcode ..
open cmark.xcodeproj

The GNU Makefile also provides a few other targets for developers. To run a benchmark:

make bench

For more detailed benchmarks:

make newbench

To run a test for memory leaks using valgrind:

make leakcheck

To reformat source code using clang-format:

make format

To run a "fuzz test" against ten long randomly generated inputs:

make fuzztest

To do a more systematic fuzz test with american fuzzy lop:

AFL_PATH=/path/to/afl_directory make afl

Fuzzing with libFuzzer is also supported but, because libFuzzer is still under active development, may not work with your system-installed version of clang. Assuming LLVM has been built in $HOME/src/llvm/build the fuzzer can be run with:

CC="$HOME/src/llvm/build/bin/clang" LIB_FUZZER_PATH="$HOME/src/llvm/lib/Fuzzer/libFuzzer.a" make libFuzzer

To make a release tarball and zip archive:

make archive

Installing (Windows)

To compile with MSVC and NMAKE:

nmake

You can cross-compile a Windows binary and dll on linux if you have the mingw32 compiler:

make mingw

The binaries will be in build-mingw/windows/bin.

Usage

Instructions for the use of the command line program and library can be found in the man pages in the man subdirectory.

Security

By default, the library will scrub raw HTML and potentially dangerous links (javascript:, vbscript:, data:, file:).

To allow these, use the option CMARK_OPT_UNSAFE (or --unsafe) with the command line program. If doing so, we recommend you use a HTML sanitizer specific to your needs to protect against XSS attacks.

Contributing

There is a forum for discussing CommonMark; you should use it instead of github issues for questions and possibly open-ended discussions. Use the github issue tracker only for simple, clear, actionable issues.

Authors

John MacFarlane wrote the original library and program. The block parsing algorithm was worked out together with David Greenspan. Vicent Marti optimized the C implementation for performance, increasing its speed tenfold. Kārlis Gaņģis helped work out a better parsing algorithm for links and emphasis, eliminating several worst-case performance issues. Nick Wellnhofer contributed many improvements, including most of the C library's API and its test harness.

Description

  • Swift Tools
View More Packages from this Author

Dependencies

  • None
Last updated: Fri Dec 20 2024 19:56:38 GMT-1000 (Hawaii-Aleutian Standard Time)