Strip HTML tags from content sent as Markdown

The "toMarkdown" function prepares content to be sent, primarily, to Diaspora.

The HTML to Markdown converter by default "preserves HTML tags without Markdown equivalents like `<span>` and `<div>.`" At least according to the README in _/friendica/vendor/league/html-to-markdown/_ - which also says "To strip HTML tags that don’t have a Markdown equivalent while preserving the content inside them, set strip_tags..."

Diaspora, however, does not appear to know what to DO with the HTML sent to it. It actually appears to _encode_ the HTML and displays the *code* in the post body rather than rendering it as HTML. In which case it would make more sense to strip out all tags that have no Markdown equivalents.
This commit is contained in:
Random Penguin 2025-04-20 12:05:26 -05:00 committed by GitHub
parent cd3d412a59
commit 09c6061810
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -689,7 +689,7 @@ class HTML
public static function toMarkdown(string $html): string public static function toMarkdown(string $html): string
{ {
DI::profiler()->startRecording('rendering'); DI::profiler()->startRecording('rendering');
$converter = new HtmlConverter(['hard_break' => true]); $converter = new HtmlConverter(['hard_break' => true, strip_tags => true]]);
$markdown = $converter->convert($html); $markdown = $converter->convert($html);
DI::profiler()->stopRecording(); DI::profiler()->stopRecording();