#dev 2021-08-27
2021-08-27 UTC
rockorager, Seirdy, jeremy, gerben, jeremycherfas and hendursa1 joined the channel
# capjamesg[d] Someone started a discussion about h-resumes somewhere.
# capjamesg[d] GWG maybe?
# capjamesg[d] Anyway...
# capjamesg[d] I think there's something interesting about assuming data marked up with mf2 is fine to use without permission.
# capjamesg[d] My initial thought was this is okay because the data has been deliberately structured in a way that makes it readable by external applications.
# capjamesg[d] Then again, I thought to myself that reading mf2 might be treated like web scraping if you done it across multiple sites without permission.
# capjamesg[d] I wonder: can anyone read structured data if it is for a search engine?
# capjamesg[d] I actually asked myself this question about a month ago. If businesses mark up their coffee products with JSON-LD (which they often do because of Shopify's SEO features, etc.) then could that be used to make a search engine?
# capjamesg[d] (I don't want to do this. It's a rather interesting thought experiment.)
# capjamesg[d] I am not in the mood to build *another* search engine 😄
# capjamesg[d] Just improving mine.
# capjamesg[d] The next feature coming up is named entity recognition. So if you type in "cairngorm coffee founder" you'll get the exact answer in bold before any search results.
# capjamesg[d] Technically the logic could be applied to any index. But it's just for my blog right now.
# capjamesg[d] I am now starting to run into performance bottlenecks though. Trimming my code will help but I may need to look into loading models faster.
# capjamesg[d] And I need to train some models on my blog rather than default datasets.
hendursa1, tetov-irc and [KevinMarks] joined the channel
# [KevinMarks] reading data from the web is OK, but combining it with other information to create profile on an individual is something that GDPR explicitly calls out - see the discussion of profiling here https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general[…]ights-related-to-automated-decision-making-including-profiling/
# capjamesg[d] That makes sense.
# capjamesg[d] Because aggregated data doesn't have the same integrity as primary data.
# capjamesg[d] Whereas one source paints a picture at a moment in time, if you aggregate two sources from different moments in time without being clear on that, you can paint a picture that the person who created the data would not think is representative.
# capjamesg[d] Well, it's not that aggregated data doesn't have the same integrity. Bad choice of word. I hope people get my point 🙂
# capjamesg[d] What is search?
# Loqi search in the context of the IndieWeb refers to being able to search your personal site for your own content https://indieweb.org/search
chenghiz_, doosboox8, hendursaga and [fluffy] joined the channel
# [fluffy] Hm, I wonder if what Authl does with user profiles counts as a potential GDPR violation. It only collects data that’s publicly-visible on the user profile that someone logs in as, and in Publ it only holds onto it for the purpose of displaying the user profile for making admin decisions and to make the site slightly friendlier to the person who logged in, but I’ve gotten a couple of folks sending me GDPR disclosure requests with
# [snarfed] [fluffy] in practice, pretty much all indieweb sites and services are exempt from GDPR (and similar laws elsewhere like CCPA and LGPR) because we're small and generally non-commercial. details: https://brid.gy/about#gdpr
# [fluffy] Yeah, that’s what I tell folks when I respond, that 1) beesbuzz.biz is a personal site and 2) the sum total of what I collect can be seen at https://beesbuzz.biz/profile
[aciccarello] joined the channel
# [snarfed] obligatory: https://xkcd.com/386/
# Loqi [XKCD] Duty Calls https://imgs.xkcd.com/comics/duty_calls.png
# [snarfed] sure. https://indieweb.org/GDPR looks pretty good. might deserve a clearer "The Indieweb is pretty much entirely exempt!" warning at the top, but otherwise seems ok
# capjamesg[d] snarfed How much mf2 do you use?
# capjamesg[d] I only read a bit for search right now but I might need to add more support.
# capjamesg[d] Sorry... I meant how many different mf2 formats do you use?
# capjamesg[d] (i.e. h-entry, h-card)
# capjamesg[d] This is the time I wish I bought an Apple Mac.
# capjamesg[d] I loved your charts in the talk. I had read them on your IndieMap site a little while ago though so they weren't new to me.
# capjamesg[d] I loved the distribution of rel=me link chart. That made me laugh.
# capjamesg[d] As did your 15 or so line crawler.
# capjamesg[d] My crawler / indexer is roughtly 831 lines of code.
# capjamesg[d] Probably add another 400.
# capjamesg[d] But I'm processing web pages, reading markup, etc.
# capjamesg[d] I am at 2013 now on your site haha.
# capjamesg[d] Amazing!
# capjamesg[d] How is the data in the WARC file ordered?
# capjamesg[d] Do you keep your /<list> lists up to date like /chocolate and /beer?
# capjamesg[d] Oh I really need a faster computer...
# capjamesg[d] I have a few similar lists for coffee but I have never made a list of all the roasters from which I have ordered coffee. I should do that.
# capjamesg[d] Me too. I don't update my lists very often but I do reference them every now and again in conversation.
# capjamesg[d] So they have a use case.
# capjamesg[d] snarfed It's still going.
# capjamesg[d] About that...
# capjamesg[d] 15 seconds later the program failed.
# capjamesg[d] I'm just going to index 1000 right now to make sure everything is okay before continuing.
# capjamesg[d] Funnily enough I was thinking about this recently.
# capjamesg[d] Well...
# capjamesg[d] "poutine" returns a result 😄
# capjamesg[d] inurl: filter working fine.
# capjamesg[d] n=1000
jjuran, KartikPrabhu, [jgmac1106], alex11, tetov-irc, Seirdy and wackycity[d] joined the channel