#dev 2020-04-25

2020-04-25 UTC
jacky, it's one of the reasons we have the proposal to add 'id' as an attribute when present
actually we added that!
[sknebel] #44 parse HTML id attribute
but that's still only if it's present, no?
granted adding more increases the chances of _some_ unique identifier being available
point being that if there's a use-case, then people will publish it.
absent a use-case, making it 'required' won't actually help with improved data quality
it'll just result in people making stuff up to placate the validator etc.
fake require info is worse than lack of optional info
my 'request'/want is to require that any microformats h-$object to have a URL
doesn't matter if it's a HTTP one (though that'd be very nice)
that would make mf2 in some page dependent on having an entire another page for each h-*
seems unreasonable. Specially with static pages, it would require having the synchronize the data between the page and the URL of the h-* object
or put an id on each h-* object
I don't see why we should put such burden on publishers
fragmented html can be used there, no?
like if my h-card only lives at https://jacky.wtf/#hCard, that doesn't seem any more complicated
it *can* be done. But requiring is a bit much
I haven't come across mf2 that _didn't_ have a URL in it (yet) so I guess I'm leaning on that for implicit behaviors
sure. until you find a case where that strategy doesn't work, that is fine to use
bullet-proofing against something that just might not happen is not useful IMO
jacky, perhaps you could come up with a fragmention way to do that
in other news, I totally missed that js;dr turned 5 years old!
We're at the halfway point to when all those frameworks in 2015 will no longer work lol
Hello, I've been lurking a bit. Was looking for recommendations on web hosts.. they all seem the same to me. Right now my plan is to do everything myself using vanilla javascript. Any suggestions on hosting provider would be great. Thanks.
Sorry, think i should of put this in the indieweb channel.
you're in the right place TimApple++
what are webhosts?
Web hosting can be the primary regular cost in maintaining an IndieWeb site; this page lists several options from free on up depending on your publishing needs, like a static, shared, private, or dedicated server https://indieweb.org/webhosts
Bunch of recommendations in there ^^^ [TimApple]
jacky, when publishing information from non internet systems in read-only format (what I'm doing now); the id and a single URL just are not important
I know what you mean, and in an ideal world, I'd publish the private systems id and url because it wouldn't need to be private, but I think a worse case scenario is publishing details of a private setup others can't even reach
I think in your case mentioned here, https://chat.indieweb.org/dev/2020-04-24#t1587762799673600, it's fine to not even have HTML for the page
[[LewisCowles]] if a u-url is not public, or cannot be known at time of generation. Should it be omitted?
Right, but I want to have the HTML, I just am not sure of the value of providing a u-url or u-id people cannot interact with past consumption
h-feed makes sense, as will h-entry and h-card
similarly I'm not u-uid'ing or u-url'ing on any of those sub-entries.
In order to do that in a way I'd feel comfortable with I'd need to A, Identify members of a closed system elsewhere (small problem right now as it's just me)
B, provide the canonical u-uid and/or u-url for them, and the resources on their source
otherwise I'd be depriving those users of their independence
or needing to facilitate some form of mailbox for others / other systems resources on my own
distributed systems: Share the least surface area practically possible between two systems.
tbqh, I'm pushing my opinion on this w.r.t how things should work
like I don't expect it to be like always available over HTTP
(like I'd love to support things like Dat, IPFS and they can be represented with URLs)
also I think having a non-public URL is different from having one that can't be known at the time of generation
like when you say non-public URL; do you mean that it requires authorization, to be on a particular sub network range, etc?
it's both RE: sub-net and authorization. It's unlikely without hacking into one of my multiple home networks that you'd be able to get in, and then you'd need to pass about 4 layers of security
having my public IP also wouldn't help as I don't externalise any of the systems
it's about selectively, being able to output that content to the web
in static form
then transform that for web use-cases
such as subscription to a kanban board to monitor progress until it's complete
I suppose I could make the u-url the one on my site, but that was the point from last night at which I began to sigh. I'll need to build it as part of deployment. Only at deploy time should I know that URL and it's not intrinsic to exporting a KanBan board from the system I export from
So at deploy time, parsing a HTML document (jsdom looks most promising) and transforming it, uploading it's transformed output for a number of pages)
it's just a lot to unpack
KevinMarks, I just saw your comment about relative urls. I suppose I could use one of those for the feed
since I do control and mandate the file-system layout at point of export, just to keep the CSS, HTML and assets hanging together that might be a great way to package this.
tantek++ RE: placating validators
as an example, testimonials were once marked up as reviews for a customer. Well reviews have a rating, so what did every review get (although I gave the customer control to set their own rating). Everything is 5-star
And these were from regional companies, so large-ish names for a regional-provider company to showcase to boost authority and get some internet juice.
Placating validators is a big thing in schema.org world. Because valid HTML microdata could show errors in Google's validator. Rather than following spec, people follow what the Google validator demands
yeah tbf I got an alert yesterday that something from 2014 no longer matched google's requirements for a job advertisement. They now require salary, location some other data which makes sense, but that they didn't need
Why can't you use a relative u-url?
That fits the bundle and dat cases
> KevinMarks, I just saw your comment about relative urls. I suppose I could use one of those for the feed
I'm totally entertaining the possibility of exporting with a relative u-url
Also has the advantage of working locally with file urls
It is a trade off - one of the problems with atom feeds etc is relative urls. Adding a base at serve time can help. Otherwise what is fetching may need to preserve headers.
I suppose a relative URL could assist a deploy-time way to enhance properties to use non-relative URL's
it's a very involved process
I may well come out the other side with "here's why I decided to side-step" parts or the whole thing
I want to be able to export database and app-reliant things to static output. I have that part done, submitted to upstream for approval.
Enhancing it is a nice side-step that could enhance the ability to get the data back out, which is also part of my initial to-static in the form of a proprietary JSON format
the JSON is in-fact easier to export en-masse with a list of board PK's, but it's inert
the HTML I feel represents something of-use in a broader range of contexts. Which micro-formats could help with.
!tell jacky from yesterday’s call, I think this is what both PHP and Python parsers have tested with as a possible whitespace cleanup alg: https://wiki.zegnat.net/media/textparsing.html
Ok, I'll tell them that when I see them next
Zegnat is this just to make pretty JSON?
because HTML ignores whitespace. I was blinking a fair bit when modifying whitespace came up
[LewisCowles]: the mf2 spec says to use textContent on a DOM node to extract plain text values. textContent is a concat of all TextNodes in the tree, which includes all the whitespace used to indent HTML.
textContent does not normalise whitespace in HTML the way a browser would do for rendering.
People have found straight up textContent way less useful for getting a plain text value. So we have been doing some brainstorming around how to normalise it in a way that plain text values would be more akin to what you would expect in a plain text format (like maybe a tweet)
so you're only running it on p-* types, not e-* types
Running it when the spec asks for textContent. Which is on the `value` property of e-* types, but not on the `html` property.
We kind of want to approximate what a browser would give you when calling .innerText (https://html.spec.whatwg.org/multipage/dom.html#the-innertext-idl-attribute)
But turns out implementing HTML innerText (different from DOM textContent!) is a bit of a slag when you aren’t a render engine (ie. browser): https://github.com/Zegnat/php-innertext
[Zegnat] php-innertext: 🏃🐉 Run. Here be dragons.
implementation details in PHP parser https://github.com/microformats/php-mf2/pull/168
[Zegnat] #168 New algorithm for plain text values
[Zegnat] #15 What should mf2 textContent parsing result in? User expectation vs. DOM specification.
[LewisCowles]: that history may clear it up more than me summarising in chat :D
I've unfortunately necro'd on that GitHub issue
it seems to be a case where following a spec is problematic because it's too specific, but when I threw in some HTML from a source. I didn't get the output I'd expected
the spec marking as a computed field, even if cached computed from the saved HTML would make a tonne of sense to me
this is a projection of whatever algorithm fits your use case to text-ify (pardon that marketing jargon BS) rich content as less-rich content
I've also hinted that plain-text might have a variant depending on application
so also it would mean in taking "text" from a h-entry e-content or similar on your site. I'd know, I either had to go back to the source (your site) at some arbitrary period (as algorithm improves); or rely on a pre-cached part as e-content already provides
jacky would then have a very good case for mandating u-url props for that case
The idea with mf2 is to get structured data ready to work with in a way that abstracts away the HTML parsing. So people do not want to go back to having to parse the HTML themselves to find innerText. Makes sense to then make the argument for switching from textContent to innerText in the mf2 spec (which is that issue)]
Yeah. I think this particular case comes down to people expecting innerText and getting textContent though. innerText is what is expected because it is what browsers would display, even text browsers like Lynx. textContent is much more like serialised AST output and therefor part of the DOM standard.
lynx in this case displays very differently
I'll attach
and it's obvious that it won't be able to cover all use cases -> which is why the source format is exposed for e- if you want some text format that keeps different aspects of the source (i.e. because you want to convert to markdown)
for complex input, turning it into plaintext is always going to be a best-effort kind of thing, not something that promises to be universally usable
[LewisCowles]: nice comment on the issue!
basically I'm not knocking the algorithm or the work
I'm suggesting that there may be an as-yet unconsidered pivot in spec
Yeah, somewhere in the thread it was decided that we would test with a single \n between paragraphs. innerText asks for 2, and that is what you are seeing in Lynx, which I perfectly find a lot better
when or how-often text is regenerated is mildly unimportant
I wonder how https://github.com/Zegnat/php-innertext would hold up against that piece of test HTML you ran there, [LewisCowles]. As that actually tries to implement innerText
[Zegnat] php-innertext: 🏃🐉 Run. Here be dragons.
I may run it for fun
Although interesting enough, lists never get their numbers, I think. Not even in the HTML innerText model. And I do not think we want to get into that either as you will just find yourself writing Lynx at some point :P
without writing that it might be that `text` is not very useful at all, and should be dropped
Many years ago (2009-2010) I had a project from a business my sister was group operations director for. They were sending image-only emails wondering why response rates were < 1%
To me it's natural to think. "Oh if people can't read it, then it won't matter how well targeted your communication is"
Thing is that the issue originated with heavy mf2 consumers, no with the spec creators, so clearly there is a need for this in application context. And if we can address a lot of those needs by running minor white space cleanups, I feel like we should try and spec those necessary cleanups for all parsers
Did a bit of work on creating alt-text fallback for those, and HTML content for those with text part fallback
in the end > 80% of people read the emails. Because the issue in creative presentation had been addressed
you're right in that it makes parsing more expensive. I had the benefit of a manual process
without valuable text content, we're just pushing bytes to people
GWG my default PHP testing and linting setup right now, would love to hear if you have any improvements for me! https://github.com/Zegnat/php-website-starter/blob/5adf61a211b927d1b1d8de6bcefae15696892aa3/composer.json#L24-L33
None specifically, WordPress has its own stuff
I did add in phpcpd which helped in checking for duplicate code
Hmm, that one might be interesting to add to my default tests too
😢 https://github.com/PuerkitoBio/goquery source documents must be UTF-8.
I can guarantee this for me. I can wish it for others. I can't force it and may have difficulty spending time and effort ensuring it.
[PuerkitoBio] goquery: A little like that j-thing, only in Go.
ugh why did that guy take down picofeed from packagist
it broke composer when there's a dependency that uses it
hmm yeah, I had a weird hiccup this week trying to upgrade xray
but couldn't pin down what was wrong, had to remove and require packages (but not sure what commands I did in order, already forgot)
Do you not have a picofeed fork, aaronpk? Point there?
i do, but i'm stuck in composer
i'm trying to just tell it to update to the latest xray which uses my picofeed fork, but it's saying it can't do it
also confusing is when i try to install an older xray, it tries to install the old picofeed even though it shouldn't
also the damn zendframework/zendxml -> laminas/laminas-xml renaming is throwing this for a loop
(I did the thing again where I started writing the url as aaronpk.com/github, but this time I knew I could continue typing)
aaronpk: does your picrofeed fork have a replace option in its JSON? https://getcomposer.org/doc/04-schema.md#replace that way you can tell Composer that it can replace picofeed. Then other dependencies you may have that want picofeed will all know they can use yours.
oo didn't know about that
But yeah, the Zend breaking into Laminas has been a little … annoying
i think something else is requiring the zend xml somewhere
this is ridiculous, this is the whole reason for composer.lock
Does composer.lock break when something is pulled from packagist? That is … bad
I never actually had it happen
yeah it can't download the package so it just freaks out
it's trying to download a release from github, github returns 404, which is also what they do with private repos, so composer says please enter your github password
a case for vendoring dependencies. Like a fat-jar / package cache
i mean i could just copy this whole vendor folder from my laptop to the new dev server but this just seems bad
“it's trying to download a release from github, github returns 404” that sounds like the lock file can’t help out there though, if it also no longer exists in GitHub..?
yeah that's what i mean
attempts to hand-edit composer.lock
what could go wrong
omg i think it worked
aaronpk: I had a composer.lock issue before as well, when trying to update php-mf2
I sympathize
i think chrome is giving me the wrong TLS error
it says "certificate revoked" but that's clearly not true
well that worked
tries generating a new certificate
\o/ worked
aaronpk: is that still about the composer stuff, or for your dev environment?
the solution in the link is really wow
i got composer working
now on to ssl
i'm moving my dev environment to a VM so that i don't have to bog down my poor laptop with it
lol package depedency systems
whoops gonna fill up the VM drive with my website storage oops
oookay new dev copy of my site is installed!
but now it's too late to do whatever i was planning on actually doing on my site
which i also forgot
jacky: Zegnat left you a message 10 hours, 13 minutes ago: from yesterday’s call, I think this is what both PHP and Python parsers have tested with as a possible whitespace cleanup alg: https://wiki.zegnat.net/media/textparsing.html
oh hm
something broke docker. A build normally taking 30-45 minutes has now taken over 2.5 hours
should restarting daemon fix it? Trying that now
lol I do that plus `docker system prune`
ah I've only been pruning containers and images jacky++
1/ Dear LazyWeb, need an obscure database recommendation. So, I’m migrating my blog from my 2014 Mac to new 16" Catalina box. I wrote it in 2002 and it’s in Perl. Has a backing database in mysql. However, it’s essentially impossible to use DBD::mysql on Catalina.
ok, I added "https://twitter.com/timbray/status/1254143366914162688?s=20" to the "See Also" section of /database-antipattern https://indieweb.org/wiki/index.php?diff=69640&oldid=69552
perl-tax 😉
resists replying with: well, there's more than one way to do it
just a joke because I never fully got PERL
[snarfed] joined the channel
I did in fact write some perl last year to do something similar - use perl on a Linux box to export data on an mssql database to text files so I can import it into AWS sql
And now we're turning off the windows and Linux box I need to remember how to do it again to get the changes since
jacky, seems I had to nuke the file-system the repo resided in. Mutative build systems ugh...
[jgmac1106]: "Fork" is clearly dev-term, and a mixed one at that. I.e. originally "forking" a project meant making a copy and continuing to work on it independently.
whereas e.g. GitHub uses it more generically for "any copy of the project", even those that just exist temporarily to work on something that is contributed back
[KevinMarks] RE: keep all the versions, that is most definitely a dev topic
I support it
Github's big culture change was fork first, ask questions later
Thx skenbel.. That one seemed easier... It is difference between embedding and inclusion I don't understand... Wiki people always talk inclusion
last year I came back from holiday to find people in-place updating software, because new was better; but without measurements and buy-in outside of engineering, that is at best posturing
feature-flags and systems configuration are wonderful beasts to enable keeping at least most of the versions
language however seems simplest when mutable
and precise
defining new higher-order publics should not replace or interfere with lower level systems
[jgmac1106]: Mediawiki specifically calls it "transclusion" afaik, and I think the user-facing part is IMHO more interesting than any technical aspects. to me, "embed" is more "you embed an entire thing (document, video, Instagram post, ...) as a box on a page", vs wiki transclusion transparently puts things in the page. E.g. the micro-hcard templates on the wiki replace the text inline, it's not obvious that it's from somewhere else
I will challenge that nodejs and just spin up more machines have aged terribly
(and mediawiki can do this also with sections of pages etc)
We're still trying to find good ways to make this behaviour make sense for people - to know when changes will ripple through and when they won't
Sometimes it starts with describing the behaviour, even if in the knowledge it will be revised in future
Ahh okay thx skenbel... Fits the work wikimedia doing to unite all citations into wikidata
I like the ability to track unfurl, transclusion, embedding. I must admit one of those terms did make my eye twitch when first mentioned. It seemed distinctly problematic
I find the mix of verbs and nouns problematic, or even the disparate framing between how something is/was done vs the effect that is achieved
outcome based terminology has it's place. I think 1 of the three is solely relegated to silo's
I see a lot of mixed / interchangeable use of transclude vs include is also annoying
(they're quite different)
yeah, it's wikipedia hints at some of the problems with it. It feels a lot like someone coined the term to avoid plain language and engage in a bit of chimping
chimping being a polite term for a very specific public display from on-high
also no one calls them "inclusions", or the process "inclusion". They come from C, #include and includes.
and later even more confusingly renamed, import and imports
lol never heard that dfn/usage of chimping
I thought I best dfn before another use gets mis-attributed. TBF I should have used the M word, we're all adults AFAIK
BillSeitz, whoa which custom engine?
I recently got IndieAuth working for my site again. But when I log in anywhere with it, the remote site shows my ID/name(?) as "Webseitz.fluxent.com wiki FrontPage" rather than "Bill Seitz". How do I fix that? I suspect it's in some combo of rel="me" and h-card. http://webseitz.fluxent.com/wiki/FrontPage
(custom engine)
I see an explicit name defined too
what's 'anywhere'? like what sites have you tried?
Tried indieweb.org
at least the wiki just always uses the url you identified with as your "username", so that's expected
billseitz_: the URL that sites use as your indieauth URL is up to your site to return
so if you want to use just your domain, you'll need a web page at the root of your domain
right now it's just blank http://webseitz.fluxent.com
and the wiki doesn't have any extra display name
So, even if I do that, the outcome will be a “username” on these sites that’s my domain, not, say, the name in my h-card
correct, otherwise anyone could just say their username is whatever and log in as other people
right. because that's clearly tied to you. tools can have a different display name, but the username needs to be a thing you can prove control over - a URL
if you want the thing on the wiki where you can use {{nickname}} then the wiki has a special way to do that which has nothing to do with authentication https://indieweb.org/wikifying#Step_Four:_Add_a_sparkline
gotcha, thx
2 more terms for this there, template and sparkline
one last compile and then sleep
managed to get a working copy of h-feed and export into WeKan export. Need to run it through x-ray
cry at failures etc