#dev 2020-05-27

2020-05-27 UTC
#
[chrisaldrich]
With so many wanting it and thinking about it maybe we should brainstorm a session on this functionality at /2020/West?
#
[chrisaldrich]
To my knowledge it isn't something offered previously by any silos either is it?
#
[chrisaldrich]
snarfed++ for always having well documented feature requests at his fingertips...
#
Loqi
snarfed has 50 karma in this channel over the last year (91 in all channels)
gRegorLove, [chrisaldrich] and supernovah joined the channel
#
Loqi
Only View Code Snippets This guide is about the HTML syntax for responsive images (and a little bit of CSS for good measure). The responsive...
#
jacky
note the perf vs design control
[schmarty], nickodd, [tw2113], [LewisCowles], [snarfed] and KartikPrabhu joined the channel
#
[LewisCowles]
[snarfed] I can't find the granary user-agent, do you have it recorded anywhere or do I have to search my server logs?
#
[LewisCowles]
hmm nothing saying granary in any case, just `"python-requests/2.23.0`
#
[snarfed]
yeah, sorry, thought it set one, but guess not. bridgy does. i can take a feature request
#
gRegorLove
sounds like a bug that it's listing your homepage as an alternate, too
nickodd left the channel
#
[LewisCowles]
TBH this is the most grumpy old man thing I've done. It's not just the homepage thing, It's the lack of creative control. I explicitly chose not to offer atom
#
[LewisCowles]
the block is simple. For now it just detects (case-insensitive) python in the user agent and blocks it.
#
jacky
entering a situation where people might have multiple urls for their h-card
#
jacky
and I want only _one_ for their contact card
#
jacky
actually hm
#
jacky
just answered it lol
swentel joined the channel
#
jacky
wow amazing
#
jacky
representative-hcard++
#
Loqi
representative-hcard has 1 karma over the last year
#
@ChrisAldrich
I've seen some using @hypothes_is as a blog commenting system. Since they don't have Webmention support or require programming for notifications, I'm highlighting this as a simple/elegant user interface for notifications [more...] #annotations #RSS https://boffosocko.com/2020/05/26/55771462/
(twitter.com/_/status/1265525715790856195)
#
jacky
nope still breaks for bridgy
#
jacky
snowflakes--
#
Loqi
snowflakes has -1 karma over the last year
cweiske joined the channel
#
gRegorLove
what breaks?
#
jacky
tl;dr: I'm working on logic to resolve incoming webmentions and I have to add a case where the root `u-url` doesn't match the URL of the incoming content
#
[LewisCowles]
can you just patch the u-url value?
#
[LewisCowles]
feels like a (relatively) straightforward string replacement.
#
jacky
patch how?
#
jacky
lol wow this is going to be annoying
#
jacky
but it's okay for now
#
jacky
is grumbling in "I want only one URL for authors, don't give me two"
#
[LewisCowles]
Oh, I might misunderstand When you said
#
[LewisCowles]
> I'm working on logic to resolve incoming webmentions and I have to add a case where the root `u-url` doesn't match the URL of the incoming content
#
[LewisCowles]
You were running a proxy script to get the HTML of page, in which case you could alter before returning
#
jacky
not a script, this is core functionality around webmentions to my site
#
jacky
so like when it comes in, I do nothing with it immediately (just capturing all of the parameters received)
#
jacky
after a few seconds depending on the queue's load, it then gets processed (pulling the whole page's HTML, then getting the whole MF2 for the page if anything)
#
jacky
after that, I attempt to resolve the author of the entry 'representing' the page
#
jacky
my first problem was doing the entry pulling; usually there can be multiple top-level items on a page (like if you have a h-feed below a h-entry); if the URL provided matches that of the h-entry on the page, I pick it out
#
jacky
bridgy response pages _don't_ do that (understandly, they're meant to be ephemeral)
#
jacky
but the _author_ bit is tricky
#
jacky
bridgy gives the URL defined by a user's profile in Twitter _and_ their Twitter profile
#
jacky
tbh I was going to get around this by looking for a rep h-card in each URL but most of the time, they won't have one
#
gRegorLove
doesn't one of bridy's author URLs have a u-uid too?
#
gRegorLove
checks, haven't looked at bridgy html in a while
#
jacky
looks at that
#
jacky
yeah but it doesn't seem like a valid URL?
#
jacky
or at least not one that I've seen before
#
jacky
look like a URN
#
gRegorLove
not the h-card.u-uid
#
gRegorLove
I was thinking a u-url that also had u-uid
#
gRegorLove
Have an example bridgy link? I can't seem to find one of mine with two authors
#
Loqi
[Angie Jones] sorry to hear this, fam
#
gRegorLove
ah, gotcha. hm
#
jacky
that said, I did manage to fix a separately related problem
#
jacky
this multiple author URL thing _might_ be annoying
#
jacky
thinks he'll just filter out twitter URLs and if there's none left but one was used, resort to the intent follow page for mf2
#
gRegorLove
I was just thinking something similar. I think my code is currently just using the first u-url
#
jacky
my thing is
#
jacky
I want to do some extra logic elsewhere where if an incoming webmention's author is from someone in my contact list to automatically approve them for appearing on my site
#
jacky
I _could_ improve how contacts work to allow for multiple urls since that's a thing that _is_ allowed by the mf2 spec (and multiple uids grr)
#
gRegorLove
ahh. yeah, multiple URLs could be good so you could have them on an allowlist
#
gRegorLove
all mf2 properties can be multi-valued, though some like uid I think would be rare to see multiples (?)
#
jacky
rare but not unexpected
#
jacky
MAY is the curse of specs lol
#
dansup
bruh, remember when pleroma had that date bug that wouldn't federate posts until the 10th of each month?
#
dansup
I released IG Import for Pixelfed, being able to edit the import is a lot of work and I hope to ship it before the weekend
#
jacky
LOL wait
#
jacky
how did that get fixed?
#
jacky
grr this is _very_ edge-case-y
dckc, moppy, jmac_, kitt_, swentel, jamietanna[m] and fredcy_ joined the channel
#
Zegnat
Any TravisCI wizards out there who could shine some light on why it may have different test results than me locally? Or otherwise may have a way to try and reproduce the TravisCI environment locally? https://github.com/microformats/php-mf2/pull/163#issuecomment-634511073
[tantek] joined the channel
#
cweiske
all the non-mastermind builds fail
#
cweiske
did you try it without the masterminds/html5 lib locally?
#
[LewisCowles]
The other side can be caching behaviour(s), failures in setup, or flaky tests, but it seems like cweiske answer is specific to this repo
#
[LewisCowles]
https://travis-ci.org/github/microformats/php-mf2/jobs/691642994 is incredibly strange to read. Some but not all whitespace has been stripped from the inside, not the outside... I'm noticing the rabbit hole, and moving on from it. Let the rabbits live
#
Zegnat
cweiske: I run both with and without tests locally, both pass.
#
Zegnat
[LewisCowles]: yes, it almost seems like the build in parser is dropping the whitespace between opening HTML tags, while the user-land Masterminds HTML5 parser does not.
#
Zegnat
But the build in parser in PHP for me locally does not have this issue…
#
[LewisCowles]
It's a non-trivial PR to read through. Mind if I pull it and run it?
#
[LewisCowles]
At least if it fails on my machine I have more visibility than travis. Otherwise publishing artifacts can help dig in to CI related issues.
#
[LewisCowles]
I've had systems where transfer changed encoding for some reason, and that broke CI & deploy. Most of the time, my approach is to try it on another machine, then a vm. Try to write down what I know is different. Occasionally I'll make ritualistic rules, like "don't use centos, it's creators have done something to screw with ssh"
#
[LewisCowles]
passes for me on 20.04 focal fossa too. I did run `composer install` instead of `composer update`. Could that be the issue?
#
cweiske
composer install is correct to get the same state of dependencies as defined in composer.lock
djmoch_ joined the channel
#
Zegnat
Sorry, had to run to a meeting and then a lunch meeting.
#
Zegnat
Travis runs composer update only to make it so it has the correct dependencies for the PHP version it is testing.
#
Zegnat
The lock file includes the dependency tree for an install on PHP 7.4 (though probably the same dependency tree as for PHP 7.3). So if that is the platform you are on, just running install should suffice
#
Zegnat
There would not be a change for you if you were to run update.
[LewisCowles] joined the channel
#
[LewisCowles]
ah. You actually need the dependency tree for 5.4. If you think about it, it makes sense
#
[LewisCowles]
well the lowest version
#
[LewisCowles]
that has tripped me up before. Lowest common denominator, not highest is the one I lock using, in-order to support the widest number of runtimes I wish to support.
#
[LewisCowles]
did you know you are building within `trusty` on travis. It's fairly old. I had some odd travis issues the other day with OS version
#
[LewisCowles]
I'll setup a VM and see if trusty could be a factor
#
[LewisCowles]
it is most odd
#
[LewisCowles]
have you pushed more builds since?
#
cweiske
Zegnat, if you remove vendor/ and run "composer update" and then run the tests, do they fail?
#
Zegnat
Not for me locally, no, cweiske
djmoch joined the channel
#
Zegnat
Also of note, there are only dev dependencies, and those are only the test runners. There are no actual dependencies for the code that is being tested, that is all selfcontained, so should experience the same output on all platforms
[KevinMarks] joined the channel
#
Loqi
If there’s one thing you can guarantee in tech, it’s that someone, somewhere, will declare that CSS isn’t up to the job of “big projects” and what will undoubtedly be recommended by those same people will be either a JavaScript-heavy approa...
#
cweiske
Zegnat, how many tests are run locally for you?
#
cweiske
for me it's only 317
#
cweiske
travis runs 399
#
cweiske
your PR
#
[LewisCowles]
I was checked out in zegnat branch
#
[LewisCowles]
although I think I'm seeing many less than 317 tests
#
cweiske
also no fails here for the 399 tests
#
[LewisCowles]
are you having to pass arguments to vendor/bin/phpunit to get 399 tests?
#
Zegnat
Yes, central-test branch should run 399 tests, just by running phpunit
#
Zegnat
Specifically: Tests: 399, Assertions: 871, Skipped: 1, Incomplete: 1.
#
Zegnat
If you have the user-land HTML5 parser installed, that skipped one should disappear and run as normal.
#
Zegnat
And however I install, and whether I use php 7.3 or PHP 7.4, I always get all of them to pass :( So it feels like there is just *something* about the TravisCI environment.
#
cweiske
what if you remove "- $COMPOSER_REQUIRE" from .travis.yml?
#
Zegnat
I could test that. I don’t think it would do anything though. That variable is either empty, or triggers the installation of masterminds/html5. (Therefor the test runs twice on each PHP version.)
#
cweiske
the only difference between the working and failing runs is that the working ones run "composer require masterminds/html5", while the other ones run ""
#
Zegnat
Or do you mean to only run the tests with masterminds/html5, cweiske?
#
Zegnat
Yes. The working ones use a user-land HTML5 parser. The failing ones use the one build in to PHP.
#
Zegnat
We expect the library to work with both. Not all users have the user-land parser installed.
#
cweiske
(maybe travis does something strange when you say
#
cweiske
install:
#
[LewisCowles]
it was working for me with and without
#
[LewisCowles]
but I do also run install instead of update
#
cweiske
but you could also just update from trusty to something current
#
[LewisCowles]
also update worked (but I've only tried it after running install)
#
Zegnat
I know extremely little about TravisCI configurations, so I am probably the wrong person to do distro changes and the likes …
#
Zegnat
This is just such a weird problem to pop up :/
#
[LewisCowles]
the change is
#
[LewisCowles]
dist: bionic
#
[LewisCowles]
in the main indented part, to check
#
[LewisCowles]
To be honest, it might be rewarding finding out if older ubuntu's are broken for some reason though
#
[LewisCowles]
I'm on trusty installing 5.6.40 now via phpenv, which the travis says it is using. It's an "experience"
#
Zegnat
I run basically everything natively on my MacBook. That will change when my new ThinkPad arrives, but until then, I am the wrong person to ask :P
#
Zegnat
Do let me know how the tests run in a more similar environment, [LewisCowles]!
#
[LewisCowles]
I'll be honest. PHP env seems like a PoS
#
[LewisCowles]
I do get failures under trusty, but I think that might be as I'm on 5.6
#
[LewisCowles]
I'll tell it to build 7.4 and see if that helps it pass
#
[LewisCowles]
It's the exact same failure as travis-ci
#
[LewisCowles]
I am wondering if your composer.json updates were only good for 7.3+
#
[LewisCowles]
which does not explain why it failed on 7.3 and 7.4
#
Zegnat
Also note that we have no dependencies for the library. It is only for running tests. So output should be the same as long as phpunit manages to run.
#
Zegnat
But if you can reproduce I guess I need to get a trusty environment set up to dig...
[jgmac1106] joined the channel
#
[LewisCowles]
I installed using virtualbox (manual). Probably should have used vagrant
#
Loqi
[phpenv] phpenv-installer: Install phpenv & php-build and update all of them when you want to
#
Loqi
[rogeriopradoj] phpenv-common-deps-install: Install common phpenv & php-build dependencies
#
[LewisCowles]
7.3 and 7.4 are refusing to install
#
[LewisCowles]
so 7.2 will be the next
#
[LewisCowles]
It's likely that however they got 7.3 & 7.4 in trusty, it's unclean
#
[LewisCowles]
it could be that phpenv is the issue
#
Zegnat
We see the exact same failures in PHP 5.6 on Travis. So I am not convinced it is a problem with PHP at all.
#
[LewisCowles]
phpenv is not php
#
[LewisCowles]
it's some nasty hack-work
#
[LewisCowles]
to allow multiple non-isolated runtimes
#
[LewisCowles]
I was not using phpenv on 20.04 in my 7.4 test
#
Zegnat
My gut feeling is that there is a difference in the XML lib they compile against. Because that could be the difference between PHP on my machine and the PHP they run
#
[LewisCowles]
although I'm only seeing whitespace differences
#
[LewisCowles]
are you using the XML lib to serialise whitespace?
#
Zegnat
I am thinking that because the user-land HTML parser passes the tests, but the build in PHP one (which is really just xmllib) does not.
#
Zegnat
We run saveHTML to extract the HTML contents there. Which is a function from DOMDocument.
#
[KevinMarks]
Different xml parsers were a pain in python too
#
Zegnat
XML lib had minor issues with HTML5 elements, so we recommend the user-land HTML5 parser. But it is slower and maybe not an option everywhere the parser is used, so we also try and test it with the basic xmllib one.
#
Zegnat
Still weird that the same parsing code (PHP DOMDocument powered by xmllib2) now shows different behaviours :/
#
[LewisCowles]
I managed to break many tests by using
#
[LewisCowles]
```$doc->preserveWhiteSpace = false;
#
[LewisCowles]
$doc->formatOutput = true;```
#
Zegnat
Yeah, you are not supposed to mess with the formatting, HTML is supposed to be returned verbatim per mf2 spec, I think
#
[LewisCowles]
it's not weird when you think about it if they are using fundamentally different revisions
#
Zegnat
[LewisCowles]: if you still have the failing setup. What if you make it do preserveWhiteSpace = true and formatOutput = false?
#
Zegnat
The issue seems to be it stripping whitespace, so …
#
[LewisCowles]
yeah I've kept it open while we spelunk
#
[LewisCowles]
hmm that didn't change anything. Wait a second.
#
Zegnat
Also, you probably need to set those settings right before we do loadHTML (line 365 ish)
#
[LewisCowles]
seems to have no effect which order they are put in
#
[LewisCowles]
just their values
#
[LewisCowles]
in-fact the more failures earlier may be that I used the wrong variable name lol
#
[LewisCowles]
I'm exporting an ovf/ova which will be at https://www.lewiscowles.co.uk/trusty-vm.ova in about 10 minutes (after export and upload)
#
[LewisCowles]
anyone that wants to play along can. AFAIK there are no secrets although the VM is a little large because I installed desktop Ubuntu instead of server ubuntu
#
[LewisCowles]
oh the root password is password btw
cweiske joined the channel
#
Zegnat
Might grab that one after work to have a look! Your time has been much appreciated, even if we still do not know what PHP is doing what it is doing, haha!
[itsjustk] joined the channel
#
@paulca
We have a squeaky bathroom door, so I thought we need to oil it. Then I realised there are a few other doors in the house that need oiled. Then I thought "wouldn't it be cool if I could pour oil down one pipe and oil all the doors at once" That's how software engineers think.
(twitter.com/_/status/1265407453753217024)
#
Loqi
ok, I added "https://twitter.com/paulca/status/1265407453753217024?s=20" to the "See Also" section of /architecture_astronomy https://indieweb.org/wiki/index.php?diff=70081&oldid=65830
[schmarty] joined the channel
#
[schmarty]
looks like netlify added a couple of features that might open up some more indieweb building blocks
#
[schmarty]
they added a build plugin system that can (for example) help you cache files https://www.netlify.com/blog/2020/05/27/netlify-build-plugins-are-here/
#
[schmarty]
(seems like it might be usable for example to pull in link previews for reply contexts, cache mentions stored in webmention.io so you don't hit webmention.io on every build, etc.)
#
[schmarty]
they also added "edge handlers" which basically seem like any old function you like. they say it can be used for auth (hellooo IndieAuth?) but maybe also Micropub, your own webmention handling, more? https://www.netlify.com/blog/2020/05/27/introducing-edge-handlers-in-preview/?utm_campaign=Introducing+Build+Plugins&utm_content=Introducing+Build+Plugins&utm_medium=email_action&utm_source=customer.io
#
[schmarty]
agh crud missed those utms on sec.
#
Loqi
utm has -1 karma over the last year
flex14 joined the channel
#
Loqi
utm has -2 karma over the last year
#
[LewisCowles]
dist: bionic
#
[LewisCowles]
done some shopping. Zegnat, we know the exact behaviour that is telling the tests to fail. It's the solution. TBH, I'm partially happy it was the backport / trusty thing. It's 3 LTS releases ago, so I think just attempting to bump to xenial or bionic using
#
[LewisCowles]
should do the trick. That comes above where the PHP versions are defined in the travis.yml.
#
[LewisCowles]
What I'll do is fork your branch, make the change and point it at travis this side
#
Zegnat
Have a go at it [LewisCowles] and let me know what happens.
#
Zegnat
We are pretty conservative with the testing environment because of things like WordPress plugins depending on the parser. And you never know what environment people are running WP blogs in
dckc and [jgmac1106] joined the channel
#
[LewisCowles]
I'm cool with that. I believe the issue may not be trusty, but rather phpenv
#
[LewisCowles]
It refuses to build 5.6 or 7.0 for bionic, and still exhibits the shady failure of whitespace
#
[LewisCowles]
two ways I can think of to fix, are to fix up fixtures (not ideal), or to look deeper into why PHP is behaving differently. There will be a flag somewhere
#
aaronpk
hmm i wonder if my website supports changing the date of a post via micropub
#
aaronpk
it looks like it should... do I dare test this in production?
[tantek] joined the channel
#
[LewisCowles]
I'm moving sideways [Zegnat] to test using OpenBSD distro maintained PHP. If that works, Then I'll try phpenv on it, and if it fails, I'll point the blame stick there
#
[LewisCowles]
Slightly an excuse to try out openBSD 6.7
#
Zegnat
Happy to provide the excus, [LewisCowles] ;)
gxt__ joined the channel
#
[LewisCowles]
it works on openBSD distro maintained PHP, now to try to get phpenv on that and see how badly it breaks
#
[LewisCowles]
coincidentally, this is why I refuse to use CentOS. They seem to have problems debian does not have. In this case it seems travis-CI choice is causing issues. I might also try some docker images I maintain
#
[LewisCowles]
some systems go poking into PHP upstream source and they should not. That is what contributing upstream is for
KartikPrabhu and flex14 joined the channel
#
[LewisCowles]
the composer.json and composer.lock edits should go for supporting php < 7.1
#
[LewisCowles]
5.6 consistently needs to be told to update composer.lock in order to install, otherwise complaining about php5.6
#
[LewisCowles]
cd2team/docker-php:5.6 fails with the same error as phpenv
#
[LewisCowles]
which means php official docker is likely to as well
#
[LewisCowles]
It's all whitespace which is so frustrating
#
[LewisCowles]
could the fixtures be part of the problem?
gRegorLove and nickodd joined the channel
#
Zegnat
“the composer.json and composer.lock edits should go for supporting php < 7.1” - this is why we call composer update instead of composer install. It will grab versions of the libraries supported by PHP 5.6
#
Zegnat
We could also decide to not ship a composer.lock, but honestly there is no harm in having that one around for developers. The lock file is not used when people install the project as a dependency to their own, so is not a blocker for usage within PHP 5.6 at all.
#
Zegnat
Hmm, alright, so to debug this I should get that PHP docker running I guess, then I can have a poke around.
#
[LewisCowles]
the VM might be the easiest way to play
#
[LewisCowles]
it is larger at 3gb vs <200mb
#
[LewisCowles]
but it has an ide within it
#
[LewisCowles]
I checked in so many areas before getting on GitHub and mouthing off at phpenv. Lucky too as it seems although most environments don't share their issue, at least one, based on official PHP is failing
#
Zegnat
I still feel like it is a matter of what version of libxml2 PHP was compiled against. But if I have access to a failing environment I can probably build a minimal failing case.
#
Zegnat
Having some random cases out of almost 400 test assertions in the mf2 parser fail is not really helpful in debugging ;)
#
Zegnat
thinks he has docker installed on his work laptop
#
[LewisCowles]
They aren't random though. The behaviour is consistent amongst those errant environments
#
Zegnat
Don’t really need an IDE or anything fancy. As long as I can write PHP even just `php -a` should be fine
#
Zegnat
I mean that big picture it is random why those specific tests fail. There are way more tests that have to handle HTML output, I think
#
[LewisCowles]
😉 the docker has xdebug, but you'd have to port bind to access from your PC
#
Zegnat
Why would anyone prefer xdebug over die('here!') ?! ;)
#
[tantek]
catching up, lots of config / dependencies stuff!
#
Loqi
utm_ has -3 karma over the last year
#
Zegnat
[tantek]: yeah, trying to figure out why the mf2 tests branch for the PHP parser is failing to output HTML correctly :(
#
[LewisCowles]
Zegnat, interactivity
#
[LewisCowles]
I wish there was a tool to diff PHP ini internal representations
#
[LewisCowles]
If I find out this is some obscure ini option I'll be glad to know, but upset
#
[tantek]
Zegnat++ thank you. That is very painful yet important work!
#
Loqi
Zegnat has 19 karma in this channel over the last year (53 in all channels)
#
[LewisCowles]
One of the failures is not HTML by the way Zegnat
#
[LewisCowles]
that was why I asked about fixtures
#
[LewisCowles]
```- 0 => 'John Doe Jr.'
#
[LewisCowles]
+ 0 => 'John DoeJr.'```
#
Zegnat
That one is a bit weird.
[KevinMarks] joined the channel
#
Zegnat
But that one does also pass for me locally, so I think it is the same basic principle
#
Zegnat
The line behind that test is pretty weird HTML, so I am wondering if it is not the exact same issue of the XML parser clearing up whitespace, [LewisCowles]: https://github.com/microformats/tests/blob/bef8e1bf7c8a930613f6f6c23a472f49dff8405c/tests/microformats-v2/h-card/impliedname.html#L23
#
Zegnat
Especially since it does pass with the alternative HTML5 parser
nickodd left the channel
#
gRegorLove
Zegnat, maybe we could try installing a specific libxml2 version on trusty? https://docs.travis-ci.com/user/installing-dependencies
#
Zegnat
gRegorLove: maybe. I just want to create a minimal test case that actually shows the core issue.
#
Zegnat
Also if we can find the xmllib issue, I think we can add it to the requirements in composer.json. So people installing it with old libxml2 are informed by composer about the issue.
#
[LewisCowles]
So, even when I removed all string replacement hacks, and ensured all alt text should come out with a space, there was no space for the dude name that gets mangled
#
gRegorLove
Makes sense, Zegnat
#
Zegnat
clearly needs some docker tutorials
[Paulo_Pinto] and [Murray] joined the channel
#
Zegnat
Whelp. Got `cd2team/docker-php:5.6` running and it uses the same libxml2 version as I have locally.
#
Zegnat
So that is out of the window :(
#
Loqi
it'll be okay
#
Zegnat
`echo PHP_VERSION . " " . LIBXML_DOTTED_VERSION;` => 5.6.40 2.9.4. Locally: 7.4.6 2.9.4
[KevinMarks]1 joined the channel
#
Zegnat
suddenly realises he is in a docker container without nano/vi/vim/ed/......
#
Zegnat
Can confirm that this gives 2 different results: https://gist.github.com/Zegnat/a94489e9b7d5501193e724e336bc6052
#
Zegnat
I have no idea why though
#
Zegnat
Yet another nail in the coffin for HTML parsing in PHP
#
sknebel
not having looked at the test results - is this something that makes one of the output variants noticably worse?
#
sknebel
or just different?
#
Zegnat
Different. But I would say that makes it worse.
#
Zegnat
As whitespace is not kept from the original HTML
#
Zegnat
Which in at least one case breaks things by returning a faulty implied name
#
gRegorLove
So weird
#
Zegnat
I would say it was a PHP bug, except that it goes for all versions according to Travis. Then I thought xmllib2, but according to what I see in my terminal right now those are the same between the docker thing and my local.
#
Zegnat
Anyone wanna toss up another idea? :P
[KevinMarks] joined the channel
#
[KevinMarks]
Some difference in utf8?
#
Zegnat
See the gist. [KevinMarks]. No special tokens. In fact, only using \n and \t, so not even accidental whitespace
#
gRegorLove
> "gives 2 different results" you mean in different PHP versions?
#
gRegorLove
or that saveHTML doesn't preserve the whitespace?
#
aaronpk
omg ownyourswarm has been frozen for 10 days
#
aaronpk
and i haven't been checking in enough to notice
[tw2113] joined the channel
#
aaronpk
and now it's gonna be a challenge to catch up while staying under the rate limits
#
Zegnat
I mean that running that PHP file on php 7.3 as provided in the docker that [LewisCowles] found (that seems to match Travis) versus running it on php 7.3 on my local machine echos two different outputs. gRegorLove
#
Zegnat
Meaning that some environments (including Travis) fails tests, even when PHP version and xmllib version (per LIBXML_DOTTED_VERSION constant in PHP) are the same
#
Zegnat
Somehow it seems that sometimes DOMDocument munches the whitespace. Haven’t written the test yet to see if it does it during parsing step, or during saving step.
#
Zegnat
My gut feeling is during parsing step, because the impliedname test fails, and that one does not rely on saveHTML (I think. As it is not an e- parse.)
#
gRegorLove
Gotcha. Unfortunately I'm out of my depth here so no suggestions :/
#
[KevinMarks]
Are there line ending assumptions somewhere in config?
#
[KevinMarks]
If you're using \n and it wants \n\r
#
Zegnat
I don’t think PHP has line ending configs? Would not know where if it has.
#
Zegnat
But systems are linux and macos, so should all default to LF rather than CRLF
#
Zegnat
Enough headaches for today. But can confirm it is the initial HTML parse that is wrong. Added a check to the gist: https://gist.github.com/Zegnat/a94489e9b7d5501193e724e336bc6052
#
Zegnat
Getting false for the TextNode in the PHP docker containers that [LewisCowles] linked to. Getting true locally :(
#
gRegorLove
I also get false, PHP 7.2.30
#
Zegnat
What environment is that?
#
gRegorLove
FastCGI, on dreamhost (my site)
#
gRegorLove
preserveWhitespace didn't make a difference, though it's supposed to be true by default anyway
#
Zegnat
Whelp. Guess php-mf2 is broken on your host then.
#
Zegnat
This is kind of what I was afraid of, a hosting setup like dreamhost having an environment where the DOMDocument parser is broken.
#
gRegorLove
It keeps the \n after </li> but not the others, hm
#
Zegnat
Means it is a good thing TravisCI surfaces these bugs.
#
Zegnat
But it is also extremely tricky to debug and fix. It honestly seems like there is just *something* in the environment that botches DOMDocument parser, but I have no idea what.
#
gRegorLove
I'll dig into this further see if I can figure it out for DH at least
#
Zegnat
`php -i` shows the same version of libxml, the same version of the dom api, everything, when I compare my local computer install (which gives true on the gist test and keeps the whitespace) and a local “broken” docker instance.
#
Zegnat
Sleep time on this side of the pond, but happy to hand this over to you, gRegorLove, haha. Looking forward to your discoveries in about 8 hours ;)
#
gRegorLove
Zegnat++ thanks for digging into this!
#
Loqi
Zegnat has 20 karma in this channel over the last year (54 in all channels)
#
gRegorLove
[LewisCowles]++ too
#
Loqi
[LewisCowles] has 9 karma in this channel over the last year (21 in all channels)
#
Zegnat
[LewisCowles]++ for providing the test bench where I could reproduce this locally
#
Loqi
[LewisCowles] has 10 karma in this channel over the last year (22 in all channels)
#
Zegnat
For clarity for people who want to reproduce: I have been running the cd2team/docker-php image, which has tags available cd2team/docker-php:5.6 to cd2team/docker-php:7.3
#
Zegnat
And they all seem to experience this edge-case issue
#
Zegnat
Now, bed. Cheers all!
#
Zegnat
cweiske++ for earlier brainstorming too! Need to be inclusive here!
#
Loqi
cweiske has 4 karma in this channel over the last year (10 in all channels)
#
gRegorLove
My env is PHP version: 7.2.30 | libxml version: 2.9.4
#
gRegorLove
So loadHTML is adding implied html, and body elements. If I use `loadHTML('html', LIBXML_HTML_NOIMPLIED)` it preserves whitespace and I get true for Zegnat's gist above
#
Zegnat
… so you are saying you may have found the problem within 10 minutes? XD
#
gRegorLove
you're supposed to be asleep
[tantek] joined the channel
#
gRegorLove
(shouldn't have tagged, sorry, haha)
#
Zegnat
Phone went buzzbuzz! :P
#
Zegnat
Really shutting down now. But this is good. We maybe just need to make php-mf2 slightly smarter about parsing input strings then.
#
Zegnat
closes chat for real
#
gRegorLove
May be worth using flag LIBXML_HTML_NODEFDTD as well, https://www.php.net/manual/en/libxml.constants.php
#
[KevinMarks]
You need to be careful with no implied, as it still expects a containing root node, so can go wrong if you pass it a fragment without an xml node on the outside.
#
gRegorLove
Good point
#
gRegorLove
Tested this utf-8 issue from the comments, too. It converted to HTML entities https://www.php.net/manual/en/domdocument.loadhtml.php#118834
[benatwork] joined the channel
#
@mrkrndvs
↩️ What if we posted comments to other spaces from our own sites utilising the power of #webmentions? Both keeping a record of our conversations, as well as owning our opinions. #pcpopup2020
(twitter.com/_/status/1265773579762712576)
[LewisCowles] joined the channel
#
[LewisCowles]
Zegnat, some of those docker containers are ubuntu
#
[LewisCowles]
apologies, I've been napping
#
[LewisCowles]
apt-get update -yqq && apt-get install nano / vim (etc)
#
[LewisCowles]
you can also volume-mount. It should have all the PHP tooling you need (but extensions may need to be activated)
#
[LewisCowles]
gRegorLove that didn't work for me
[snarfed] joined the channel
#
[LewisCowles]
I provided the ubuntu VM to have the desktop friendliness to click through
#
[LewisCowles]
YES cracked it
#
[LewisCowles]
gRegorLove, your link to the flags works, but I find that frustrating that it works as it means some PHP compilations are having default flags, others are not getting
#
[LewisCowles]
`LIBXML_NOENT | LIBXML_NOXMLDECL | LIBXML_PARSEHUGE` fix it for me, but it might be that it's one of them, so I now need to test it with variations of
#
[LewisCowles]
the first and last affect parsing, the middle affects saving
#
[LewisCowles]
before I went for a nap I was realising how many test cases there were while stepping through in a debugger
#
[LewisCowles]
and it works for me if it's just LIBXML_PARSEHUGE
#
[LewisCowles]
sea of green on travis. I do now feel the need to know how some PHP, of the same version, compile with that on implicitly and others need it specified
#
Loqi
php has -1 karma over the last year
#
[LewisCowles]
I'm furious
#
[KevinMarks]
Is it getting OS defaults from the C libxml2?
#
gRegorLove
The utf-8 issue is worth looking into but it's kind of an aside
#
gRegorLove
I didn't find docs for any php.ini settings for libxml so I'm guessing it might be OS level
#
[LewisCowles]
well the OS we're (mostly) experiencing this on is ubuntu
#
[LewisCowles]
and the packaged PHP works
#
[LewisCowles]
so if it is that KevinMarks, they've documented it somewhere
#
[LewisCowles]
and it's not made it's way upstream.
#
[LewisCowles]
ah, I did check BSD, which behaved the same as Ubuntu.
#
[LewisCowles]
Ubuntu built-in, not ubuntu compiled.