#aaronpkhmm new twitter char counting failed in the same way
#corlaez429 are, I think, impossible to implement for static websites. What if we take a more agressive stand and just block google altogether until they change their crawling decisions (maybe never, but I feel everyone can implement this and could create more buzz)
#corlaezWell, for some static websites, that do not control the actual server hosting them
geoffo, jacky, Seirdy, angelo and gRegor joined the channel
#gRegorDo crawlers mostly ignore ETag and Last-Modified? Otherwise seems like those would be preferable over a sitemap with <lastmod>
geoffo, gRegorLove_ and mro joined the channel
#@ton_zylstra↩️ dat was ook het punt: maak WP compliant met microformats2, en de classes v post kinds. Zodat themes en blocks er standaard mee uit de voeten kunnen. Als je dan webmention aan zet is ineens 40% van het web een open sociaal platform, buiten de silo's. (twitter.com/_/status/1570685363835863045)
jjuran, gRegorLove_, mro_ and mro joined the channel
#capjamesgangelo I have to start processing 429s in IndieWeb Search.
#capjamesgangelo IndieWeb Search crawls your home page and spiders out from there if you don't have a sitemap.
#capjamesgBut... URLs in a sitemap do get priority in IndieWeb Search.
#capjamesgThat's only because they are discovered first and URLs are crawled in order of discovery.
#capjamesgUnless your site has over 30,000 pages, this is inconsequential.
mro and tetov-irc joined the channel
#[tonz][snarfed] wrt ‘it may add up to significant bandwidth for big sites’ , an example given was a single page wordpress site w 4 files, getting single digit visitors per week, content not changing, still getting 600K hits from bots and crawlers per month. In part because of all those unnecessary URLs (and crawlers being wasteful themselves) A big site saw Google crawler hit their site every 2 seconds to index the whole thing. So it’
#[tonz]probably significant for every site (until crawlers learn / are taught to tone it down)
jonnybarnes, angelo_, jacky, jordemort and jacky__ joined the channel
#jordemort(moving from #indiweb) i'm working on a client-side search engine for my static site using sql.js and had the thought: what if there was something like <link rel="sqlsite" href="/path/to/sqllite.db" /> that served up a sqlite database with all of a site's posts indexed in some sort of agreed-upon schema? (prolly based on h-entry)
#jordemortthen it'd be easy to build some "standard" client-side search javascript, or CLI tools to search sites that implemented it, or metasearch engines where you could pick sets of sites to search together
#jordemortso clients wouldn't even have to download the whole index, just range-request what they need
#[schmarty]1this would be relevant for capjamesg who has been building his own indieweb search crawler/index/etc https://indieweb-search.jamesg.blog/
#jordemortregular sql.js works fine over http too, it just has to fetch the whole database; the range requests are the real magic in httpvfs
#[schmarty]1federated and cross-site search, where each site hosts their own search and some tool makes multiple requests and aggregates results, has come up a few times but it has a lot of variables and i don't know that anyone has made a real go of it.
#[schmarty]1jordemort: personally i'd be willing to try it out once you have something working for your site that is replicable!
[Will_Monroe] and jacky joined the channel
#jordemortunrelated, except that i'm doing my indexing by parsing my mf2 metadata: did i miss it or is there no standard way to mark up tags/categories in h-entry?
#Loqimisses it or is there no standard way to mark up tags too
#jordemorti've started using `p-tag` which all the parsers seem to pick up
#[schmarty]1parsers are vocabulary-agnostic, so they'll pick up `p-anything`
#Loqitags or tagging refers to categorizing or labeling content, your own or others (tag-reply), with words, phrases, names, or other information, optionally linked to specific people, events, locations, such as the practice of tagging posts being about certain people (person-tag), like tagging people or other items where (area-tag) they're depicted in a photo https://indieweb.org/tags
#[schmarty]1there's a how to markup section there. `p-category` is most common, i think
#jackymight document the site if I eventually do some manual POSSEing
#jackyit even goes as far as having acquistion info and MSRP versus purchase price
jacky joined the channel
#angelo_re: <lastmod> vs ETag/Last-Modified; one request to a sufficiently marked up sitemap will allow a bot to pinpoint the four documents out of thousands that need to be re-crawled. conditionally requesting still requires every document to be hit.
angelo joined the channel
#[snarfed]jordemort adoption is usually the biggest challenge with any idea like this. there's a big established ecosystem around the existing adopted standards (HTML etc) and big established search engines. they may not be perfect, but they're fully adopted
#[snarfed]new ideas like this will struggle to get more than a few sites to adopt them at the beginning, so the resulting search engines' indexes will be unusably incomplete, so people won't use them much, so other publishers won't be incentivized to adopt
#[snarfed]one way to handle that is to supplement results with existing search engines until adoption hits critical mass
#[snarfed](and this is all still just considering centralized search. federated search, ie send the query to all/many nodes and compile the results, I don't even know how to begin thinking about, so much of it seems so intractable. I'm honestly curious how the fediverse does it, if at all)
jeremycherfas joined the channel
#jordemorti don't know if i necessarily care about massive adoption / uptake 😉
#@wikipediachain↩️ XSL > Web Ontology Language > Semantic HTML > RDF/XML > DOAP > Rule-based system > Simple Knowledge Organization System > Agora (web browser) > IndieAuth > XPointer > XSL > WebXR > Oculus Rift S > Hack (programming language) > List of mergers and acquisitions by Meta Platforms (twitter.com/_/status/1570825778585010178)
#jordemortbut i'll prototype it on my blog, maybe turn a couple books into other example sites for a demo of "federated" search
#angeloso i just began consuming opensearch files; see https://indieweb.rocks/adactio.com left column below his card; if you try a search you'll see that you wind up on his site and that the results are marked up
#angelowhereas my immediate experience with tumblr is that many/most people stray from the default theme
geoffo, barnaby, [Jamie_Tanna] and jacky joined the channel
#angelothat said, they do have sitemaps with <lastmod>: https://indieweb-test.tumblr.com/sitemap1.xml ; while i expect tumblr users don't do much updating of old posts i feel confident finding the few that do would be doable only by watching the sitemaps
#angeloETag on the sitemap.xml would be *chef's kiss*
AramZS, gRegor and Ruxton joined the channel
#[KevinMarks]Also I need to write tests for the different post and composite cases as I think I may have some of them wrong. Then we can evangelize other theme authors once we show value
jacky, tetov-irc, nertzy, gRegor and geoffo joined the channel