#microformats 2020-01-28

2020-01-28 UTC
[tantek] joined the channel
#
[tantek]
jamietanna[m] that's great! There's a couple more things you could consider adding
#
Loqi
[tantek]: jamietanna[m] left you a message 1 hour, 50 minutes ago: first draft of my Google MF2 support post can be found at https://gitlab.com/snippets/1933801 - let me know what you think (and others reading this, too!)
#
[tantek]
e.g. total number of years that Google has now supported microformats
#
[tantek]
and something like: With the removal of data-vocabulary support, microformats is now the metadata format that Google has supported the longest. With the continued distributed growth of microformats2 across the IndieWeb, we expect Google will extend its microformats support accordingly.
#
[tantek]
would be great to have an updated Indiemap crawl of mf2 to cite a specific number of sites / pages etc.
[cleverdevil] joined the channel; Prabhaav|SimpleI left the channel
#
[snarfed]
new indiemap crawl would be nice, agreed, but not as useful for "how many sites have mf2" type questions, since the set of sites is relatively small and specific to the indieweb community
#
[snarfed]
it's better for questions about the indieweb community specifically. and even then, more for relative proportions than absolute numbers
#
[tantek]
snarfed, would you include mastodon sites / toots as part of the crawl?
#
[tantek]
because those have microformats also
#
[tantek]
certainly the media consider Mastodon as part of the IndieWeb 🙂
#
[snarfed]
all of mastodon is a much larger overall corpus than indiemap targeted, so no, probably not, if only out of feasibility
#
aaronpk
heh you'd make a lot of people mad including their mastodon toots in the crawl data
#
[snarfed]
> For this dataset, I focused on web sites that have interacted with the IndieWeb community in some meaningful way.
#
aaronpk
there's plenty of precedent for that kind of problem already
#
[snarfed]
hah maybe. i obviously would only include public toots. but still, agreed
#
[snarfed]
we've had this discussion lots about rendering wms
#
aaronpk
trying to remember the link
JenSelter joined the channel
#
[snarfed]
skimming. seems...aggressive, but reasonable
#
[snarfed]
i actually got one request to remove a site from indiemap a bit ago, which i did happily. not the same, but related
#
Loqi
[csarven] #2 Request to omit all statements on csarven.ca
#
aaronpk
one particularly bad problem was the researchers republished the content under cc0, which is absolutely not right
#
[snarfed]
oh yeah, very. i carefully disclaimed that in indiemap. https://indiemap.org/docs.html#Crawled+content
#
Loqi
[Ryan Barrett] Indie Map
#
[snarfed]
also they included unlisted toots and evidently ignored robots.txt
#
aaronpk
ouch yeah
#
[tantek]
good points snarfed
#
[tantek]
I wonder if crawling and only keeping aggregate data (# of public toots, # of microformats found) would be acceptable (i.e. not keeping any of the text or authors)
cyr-, cdcarter-wiki-vi, PamelaAlexa, GWG and [snarfed] joined the channel
#
[snarfed]
[tantek] eh the main part that takes time and effort is running and babysitting the crawl in the first place. after that, keeping the data (and uploading etc) is pretty easy
[Christina_Hendr, KartikPrabhu, [Marlin_Forbes], [KevinMarks], [tantek], [LewisCowles], [Rose], jgmac1106, [jgmac1106], BrilliantRose, GabbyWest, cyr-, [xavierroy], CandiceSwan, AnnaNystrom, Bogie1, [snarfed], jmac, [AlisonW], [cleverdevil], [schmarty], [Michael_Beckwit, [Jamey_Sharp], [jeffpaul] and chrisaldrich joined the channel
[Michael_Beckwit, [Jeannie], [jgmac1106], [mapkyca] and KartikPrabhu joined the channel