#dev 2023-03-30

2023-03-30 UTC
#
epoch
figure it'd make sense to put the "self" acct for the server in some well-known path. hostmeta or something.
#
[tantek]
or use an h-card at root with a public key
#
[tantek]
no funny protocol dance needed, just https GET
#
epoch
or hostmaster@domain
lagash joined the channel
#
epoch
use the old RFC about required mailboxes? :P
#
[tantek]
except @-@s / acct: != email
#
[tantek]
better stay 'web-like' than introduce a new use of pre-web protocols
#
epoch
or return an actor object at root when the request uses Accept: application/activity+json ?
#
epoch
probably let people 3xx that to wherever they want
#
[snarfed]
feel free to chime in on the issue
#
epoch
are they paying peoplt to use it yet?
#
epoch
people*
#
epoch
what do you think the difference is between "low-rate limit" and "low rate-limit"?
#
[snarfed]
10k tweets per month 😂
#
epoch
> For hobbyists or students learning code
#
epoch
> $100/mo
#
[snarfed]
my last estimate for Bridgy was that it fetches 10-30M tweets per month 😬 https://github.com/snarfed/bridgy/issues/1410#issuecomment-1423407497
#
aaronpk
Oh gosh
#
[snarfed]
couldn't help but notice that they immediately closed that post to replies. cowards
#
aaronpk
time to have people bring their own twitter dev credentials to bridgy?
#
[tantek]
hmm, maybe I need to write code to switch my Bridgy Publish of photo posts to directly use the API (that I'm already directly using for plain text notes)
#
[snarfed]
seems like it. I'd pay the $100/mo if I could keep backfeed running. less likely just for publish
#
[snarfed]
so long, and thanks for all the fish
#
[snarfed]
maybe I'll try scraping
#
[tantek]
is it worth considering an open collective for Bridgy at this point?
#
[tantek]
I would help chip in whether or not I used it, because it does help get people started on the path to shifting away from depending on Twitter
#
[snarfed]
see above ^, Bridgy backfeed needs something like 1000x the $100/mo quota. not sure what that would cost, but based on previous news, might start at $42k/mo
#
[snarfed]
Ah, if you meant just for publish, maybe! I'd just switch my current monthly indieweb donation to pay Twitter
geoffo joined the channel
#
[tantek]
yikes to 42k/mo for backfeed 😞
#
epoch
is pgp auth on indielogin.com broken for anyone else?
#
epoch
I'm getting 504s
#
epoch
started refusing actor requests that aren't verified and put in loud logging
lagash, Pablo1 and Xe joined the channel
#
[snarfed]
btw if anyone here is banking on using Twitter's free write-only API going forward, note that it's v2 only, they're turning off the v1.1 API entirely within the next month!
lagash joined the channel
#
[snarfed]
as [aaronpk] suggested earlier, I could consider migrating Bridgy Publish to the v2 Twitter API and have it accept users' API client ids and secrets. for the current Bridgy Publish for Twitter users here, if you're interested in that, please apply for v2 API access (https://developer.twitter.com/en/portal/products/free ) and let me know if you get approved
#
@schnarfed
↩️ Real announcement finally dropped. Three tiers: * Free, single-user, write-only * Basic, $100/mo, 10k tweets/mo read quota (< .1% of http://brid.gy's current usage) * Enterprise, reportedly starts at $42k/mo for higher quotas https://twittercommunity.com/t/announcing-new-access-tiers-for-the-twitter-api/188728
(twitter.com/_/status/1641299976486813696)
lagash and [jamietanna] joined the channel
#
[jamietanna]
Looks like I've got a v2 app I can use, don't wanna change anything because the Twitter API dashboard shows:
#
[jamietanna]
> To create a new App, you'll need to delete 2 Apps.
#
[jamietanna]
> Quota: 2 of 1 Apps
mro joined the channel
#
IWDiscordRelay
<c​apjamesg#4492> !archive https://events.indieweb.org/2023/03/homebrew-website-club-europe-london-GqSqrL8kdnzV events/hwc-europe-2023-03-29
#
IWDiscordRelay
<c​apjamesg#4492> Can someone run that command on IRC?
#
IWDiscordRelay
<c​apjamesg#4492> Or indeed Slack I think.
#
IWDiscordRelay
<c​apjamesg#4492> I really need to get Cali working with Discord.
#
epoch
i gotchu
#
epoch
in IRC
#
epoch
or whatever bot that does it just doesn't say anything..
[marksuth], geoffo, mro and wagle joined the channel
#
IWDiscordRelay
<c​apjamesg#4492> Cali is gone!
#
IWDiscordRelay
<c​apjamesg#4492> 😦
#
IWDiscordRelay
<c​apjamesg#4492> I'll fix it later.
#
IWDiscordRelay
<c​apjamesg#4492> epoch++
#
Loqi
epoch has 2 karma over the last year
mro, wagle, gRegor and IWSlackGateway joined the channel
wagle, [KevinMarks], mro, lagash, [snarfed], [pfefferle] and geoffo joined the channel
#
IWDiscordRelay
<c​apjamesg#4492> angelo Did you get a chance to dive into Hugging Face?
[manton] joined the channel
#
[manton]
My initial thoughts on Twitter API changes: I’m going to pay Twitter to preserve POSSE from Micro.blog for a limited time, tentatively thinking 2-6 months, to give people time to wind down their use of Twitter. There’s no future with Twitter. A sad but predictable end, I guess.
#
Loqi
[manton] has 25 karma in this channel over the last year (41 in all channels)
#
[snarfed]
[manton]++
#
[snarfed]
unrelated: does anyone know whether/how argument values in a form-encoded POST should be URL-encoded and decoded? eg all characters, or just = and &, or something else? we've been debugging a problem with this in #indieweb-wordpress and https://github.com/pfefferle/wordpress-webmention/issues/359
geoffo joined the channel
#
sknebel
[snarfed]: for keys and vaues, percent-encode everything "except the ASCII alphanumeric, U+002A (*), U+002D (-), U+002E (.), and U+005F (_). "
chenghiz_ joined the channel
#
sknebel
(and encode spaces as "+")
#
[snarfed]
sknebel interesting, thanks! that's from a...x-www-form-urlencoded spec?
mro, lagash, [tantek], bret, IWSlackGateway, [schmarty], gRegor, [snarfed], [pfefferle], holiday_medley and [KevinMarks] joined the channel
#
[KevinMarks]
You can also encode spaces as %20 the + is an old thing
#
sknebel
[KevinMarks]: see the link, its explicitly specified as +
#
sknebel
(yes, %20 likely works too in practice, but form-urlencoded explicitly says +)
#
[KevinMarks]
“an optional boolean spaceAsPlus (default false):”
#
sknebel
and form-encoded explicitly sets that to true
#
sknebel
"Let name be the result of running percent-encode after encoding with encoding, tuple’s name, the application/x-www-form-urlencoded percent-encode set, and true.
#
bkil
Embrace the +! It allows for way shorter URLs in practice due to most human generated text containing way more spaces than pluses or percent signs. That being said, I produce quite dirty escaping in my anchors to cram as much entropy as possible, but that's not the same as the query that others are also expected to parse.
[campegg], gerben and [tantek] joined the channel
#
[tantek]
I thought the + = space thing was only in the query params of URLs, not in the path segments for example
#
bkil
Yes, it only applies to the query (i.e., parts produced by the user agent from your form during submission). You must escape space as %20 elsewhere in the URL where allowed (i.e., in the path and fragment).
#
bkil
But again, in a given domain where I was handling text in the fragment, I'd also opted for the dirty trick of translating spaces to plus there for increased link clarity. It may or may not be friendlier to the dumb tokenizers of web scrapers if SEO is important for you.
#
capjamesg
Cali...?
#
capjamesg
!archive d d
#
sknebel
wrong channel?
#
capjamesg
:facepalm:
wagle joined the channel
#
capjamesg
Is it reasonable for me to give my Bot the context of discussions of which I have been a part, instead of just my messages in isolation?
#
bkil
I'm new here, but in our own communities, the poster is the sole owner of their content (posts, attachments and reactions), hence you are only allowed to back up your own content or ones that the owner declared to be freely licensed. Similarly, in your Facebook take out archive, only your own comments and photos will be included - and we are talking about the Devil Himself regarding privacy....
#
capjamesg
That's fair.
#
capjamesg
You skipped ahead to what my next question was going to be: and what are the ethical ramifications of doing so?
#
bkil
But also, you can already link to various content on the chat: https://chat.indieweb.org/dev/2023-03-30#t1680207301672700
#
capjamesg
bkil++ for being a step ahead!
#
Loqi
bkil has 1 karma over the last year
#
capjamesg
And welcome (if I haven't said that already!).
#
[snarfed]
that is indeed a common norm in the fediverse. we come a bit more from the open web culture here, which is a bit more permissive toward what's allowed/expected of content published publicly on the web
#
bkil
💜
#
bkil
If you are assembling documentation or extending the wiki, just ping the specific person you want to cite. We do that and they never mind.
#
capjamesg
The situation I was describing is using the text in an AI bot.
#
bkil
I'm working on a secret, backend-optional open social networking project and I strictly differentiate this: on your own properties (e.g., your own domain), only your own content will be rendered statically within the HTML. As it supports weaving conversation threads between follower circles, it dynamically pulls in all your interactions and as long as they are not deleted, they will remain visible to any visitor (-> right to be forgotten). But on public forums
#
bkil
of mine, each member must declare within their membership metadata how they choose to license their content, under which license and whether they allow the forum "hoster" to mirror the said content on their own properties (subject to periodic checking of the source for expiry).
#
bkil
Thanks for the link. It is impressive that the more I browse the "wiki", the more I encounter questions & answers that I have been thinking about in the past.
#
capjamesg
It's a question that needs addressing, though.
#
capjamesg
I don't feel good about including context because it's not my content, even though doing so may help improve the bot's accuracy.
#
bkil
It is odd that some (legacy) forums "solved" this by way of "owning" all content posted there. The only right to be forgotten seems to be is that they delete your name if you remove your account. Tough luck if personally identifying or embarrassing bits remain deep within such comments. I wish collaboratively edited forum comments and making history option was a thing, but I can always dream.
#
gRegor
I feel a bit "eh" (against) about my chat being included in the bot, though haven't thought about it a lot.
#
bkil
capjamesg: Could you perhaps write a script that gathers such snippets, groups them by author and just share the list for review to the said owner where they could just press OK every once in a while? This seems like something that others would also benefit from.
wagle joined the channel
#
capjamesg
gRegor Don't worry, I'm not doing this. It feels like a good discussion to get on the record.
#
gRegor
Oh yeah, understood
#
bkil
And of course you could have a wiki (or just the user's profile page?) where each user could declare upfront that "feel free to do whatever I write publicly". I know quite a few such people.
#
bkil
Wait, are you trying to train an AI FAQ assistant based on the whole chat log indiscriminately?
#
[snarfed]
another distinction is whether the bot itself is public or private. capjamesg if you only used this chat-trained version of the bot yourself privately, it'd probably be more acceptable than if you exposed it publicly, like https://jamesg.blog/bot/ currently is
#
bkil
Full disclosure: just found an old homepage that I can be embarrassed about much more than I remembered. I'm starting to consider arranging for access to update it...
#
[snarfed]
(especially if those chats aren't themselves public. chat that's currently archived publicly, probably more ok to train and expose publicly, at least if you subscribe to the open web norms)
#
bkil
Again, the question may be moot as the chat is already publicly indexed https://indiechat.search.cweiske.de/?q=ChatGPT I have mixed feelings about it. As long as it's not my content, I'm still amused about the implementation, how snappy both the chat log and the search engine is, how good it works without JavaScript, seeing what great conversations you had in the past, deep technical insights and the lack of spam. But then I think about appearing here myself,
#
bkil
and then it hits me that this platform doesn't even allow for correcting typos. 😼
#
bkil
And yeah, surely they'd also trained ChatGPT on huge swaths of web crawls over the years (usually respecting robots.txt), so a bunch of indieweb content will already be present in its "brain".
[jeremycherfas] joined the channel
#
capjamesg
#chat is not public.
#
capjamesg
IRC logs there aren't put on the public chat logs.
#
capjamesg
"this platform doesn't even allow for correcting typos" XD
#
[KevinMarks]
There was a bot that understood the s/// syntax at one point
#
capjamesg
What would it do?
#
capjamesg
amend the logs?
#
[snarfed]
Zakim, still, right?
#
gRegor
Loqi used to do it
#
gRegor
iirc it was basically `sed`, Loqi would repeat the line with the fix
#
[tantek]
RRSAgent understands s/x/y/g
#
[tantek]
capjamesg, I'd say no to training your AI on chat, too vulnerable to lots of noise polluting its model
#
bkil
capjamesg: I wouldn't design a communication platform without a way to be forgotten or a means to build a database, future reference, discoverability and leisure. One has to constantly review their own lines before submitting, otherwise fearing to be faced their mistakes or things they have changed their mind about on an HR interview. This is self-censorship at its worst. https://en.wikipedia.org/wiki/Self-censorship
#
[tantek]
bkil, that, in brief is IRC though
#
bkil
Correction & redaction comes standard on Matrix, XMPP, FluxBB/Discourse (and I think basically all forum engines above 1k SLOC), better Fediverse servers such as Friendica/Hubzilla and a few non-federated, but also established FOSS messengers as well.
#
bkil
Yes, IRC has its uses for rapid exchange or timely acting upon a signal (such as monitoring alerts, doorbell, etc) But for serious professional discourse, I feel it to be quite dated due to the above (and due to quite a few other boring drawbacks). I mean, we have VoIP conferencing tools today that is way faster than typing (although less accessible to some and not commonly indexable right now)
#
capjamesg
[tantek] not training, referencing.
#
capjamesg
Data could be used as a source in a prompt, not in the underlying model.
#
capjamesg
Training is a big no no.
#
bkil
And we also bridge a few of our Matrix/XMPP support rooms to IRC as well on popular demand, but those are also more about providing quick tech support.
#
[tantek]
capjamesg, you may be interested in discussions of how to treat /reply-context content as well which is from outside sources
#
bkil
Interest in chat bots seems to be coming and going in waves over the decades (and over a fractal of timescales down to days). One guy just hooked up ChatGPT in a room (Delta.Chat) and a bunch of people started to play with it non-stop. I did challenge them to give it any meaningful question and its output was vague at best and 50% wrong in the worst case. I'm undecided whether offering no support to a newbie or offering misleading support for them in half the
#
bkil
cases is better. However, we've decided to let it stay for an evaluation period - at least it makes the atmosphere less dull for a while.
#
capjamesg
bkil Say !jamesbot <query> in #chat.
#
capjamesg
Replace <query> with your question.
wagle joined the channel
#
bkil
Unfortunately it didn't let me join right now, maybe I can try tomorrow. Although, I'd probably be more interested in seeing the full list of _uncurated_ input & output that others have already tested it with over there than to rob you of your tokens with my fuzzing. Too bad the chat channel is not logged...
#
capjamesg
You can use the web UI.
#
gRegor
Couldn't join #indieweb-chat?
[chrisbergr] joined the channel
#
bkil
capjamesg++ for asking the uncomfortable questions that others aren't willing, for innovating and providing public PoC
#
Loqi
capjamesg has 24 karma in this channel over the last year (83 in all channels)
#
gRegor
It's intentionally not logged https://indieweb.org/discuss#chat
#
bkil
gRegor yes, I've read that
#
bkil
Probably doesn't allow the matrix bridge to join it for that reason as well, but never mind that.
#
bkil
Also thanks to all for having me here. It's getting late so got to hit the sack but will be following the log (where available) 👋
#
capjamesg
There's probably an IndieWebber awake at any time of the day :D
#
capjamesg
what time is it for capjamesg?
#
Loqi
In capjamesg's timezone, Europe/London, it is currently 10:34pm on March 30
#
gRegor
Glad to have you, have a good night bkil
#
gRegor
what time is it for Loqi
#
Loqi
In Loqi's timezone, America/Los_Angeles, it is currently 2:36pm on March 30
[James_Van_Dyne] joined the channel