#Loqicapjamesg has 47 karma in this channel over the last year (185 in all channels)
#[tantek]capjamesg is python code readily runnable in a Browser Add-on? Or would the code need to be translated to Javascript
#capjamesg[tantek] I'd need to translate it. I'm more comfortable in Python.
#capjamesgI just added more of the Wikipedia disinformation lists, too.
#capjamesgRegarding the classification, I think the best approach would be to have a pre-classified list of Wikipedia categories so the browser can do a lookup.
#capjamesgYou can technically do classification on the fly with a library like transformers.js but it is wasteful to load a model and compute vectors in the browser when all you need is to know if a Wikipedia category is positive or negative.
#[tantek]totally agree - no need for a model / vectors and possibly would give inaccurate results too
#[tantek]With Wikipedia you're at least providing some degree of consensus human review results
#capjamesgI think a classification model is ideal for a first pass of the Wikipedia categories at a low threshold, depending on how many there are. Then human review of anything below a certain threshold.
#capjamesgUsing the classification model I'm using right now -- a public model fine-tuned on DistillBERT -- I am getting good results with a 80% confidence threshold.
#[tantek]oh do you mean classifying the categories themselves? rather than the domains/articles?
#capjamesgI'd envision a browser extension to have a Red, Green, or Grey icon that shows depending on whether a flag has or has not been raised, or if no information is available about a page.
#capjamesgI think we're on the same page with that.
#[tantek]from what I can tell, no one else had previously proposed using Wikipedia look up of domain names in an automated manner like that
#capjamesgThis implements our algorithm -- minus the n-day consensus -- with lookups on Wikipedia's reputable lists, and, potentially, third-party, authoritative lists of fake news websites.
#capjamesgI found some site that was charging for an API for this 😦
#[tantek]it's that plus the human-curatd "bad list" of Categories to look for that make this work IMO
#[tantek]I don't trust "third-party, authoritative lists of fake news websites" to be "authoritative" — because the process by which it is done is not open
#capjamesgActually, correction: "We are very selective in who we allow access to our API. Please fill out the form below to receive permission or a quote to use our API."
#[tantek]I wouldn't use any "closed" list for this. Too easy for a private bad actor to silently mess with it
#capjamesgGo make dinner. I'll sleep. But I'd love to explore this further! I may turn my Python code into a package so it could be integrated into web app back-ends. That's a task for another day!
ttybitnik and geoffo joined the channel
#[Al_Abut][aciccarello] and @btrem - I’m late to the discussion but yes, you can skip the <picture> element and just use <img> while serving up different sizes. Just use the <img> tag like usual and use srcset with different widths.
geoffo, Tiffany and scottishstoater joined the channel
#capjamesg[tantek] I have added consensus logic to the Python implementation. You can choose between four strategies: percentage, majority, unanimous, or in one or more days.
#capjamesgCategories are retrieved once per day, cached, then passed through the consensus logic.
#capjamesgThe package is now on PyPi and available for download. The docs aren't live on a website yet, but I'll get around to that at some point.
scottishstoater, teder_[d] and chimo joined the channel
#capjamesg[tantek] I don’t think extensions can make requests to the Wikipedia API because of CORS?
ttybitnik, scottishstoater, [Ros], gRegor and [dmitshur] joined the channel
#[dmitshur]I've realized recently that git itself doesn't seem to enforce the `user.email` value, it being an email is a convention but otherwise it can be set to a URL. I'm curious if anyone has considered taking advantage of that and actually use a URL (instead of email) as theit git author identity, for the same reasons that are motivated at https://indieweb.org/Why_web_sign-in#Why_not_email.
#[dmitshur]The reason I'm thinking about this now is because I'm working on letting people send changes to the git server on my site, and I authenticate and identify them via a URL. I don't track user's emails (and don't want to track them), so if git commits use email addresses, it's harder to associate it with a user. If git commits happened to use URLs, it'd be trivial for me to automatically verify that a person who authenticated as
#[dmitshur]I guess most people would not want to use a different git author identity only in some contexts, and email has the benefit of being the widely accepted default that'll work everywhere without any surprises (GitHub, Gerrit, GitLab, random people's personal websites, etc.). So someone would have to be very very motivated to consider switching away from email as their preferred git author identity.
scottishstoater and ttybitnik joined the channel
#[Ros]Hey pals, does anyone have any recommendation for YouTube episodes/series/creators who discuss the history of HTML, CSS, and Javascript?
#[Ros]E.g. I just went https://youtu.be/NzzGt7EmXVw?si=W1DZY1KH3n-mTcB0 that covers why and how HTML is formatted as it is today ([Jeremy_Keith]’s book is mentioned in episode 2), and I’m finding it so helpful to understand the background. So many “like duhhhh” moments hitting for why we do things as we do. I’m looking for more good episodes on HTML, CSS + Javascript if anyone can recommend 🙏
#[KevinMarks]Jeremy had given a lot of good talks on HTML - search for those
#[Ros]↩️ Thanks [KevinMarks]! I actually have *6* talks of his lined up to binge-watch this weekend 😬
#[Ros]↩️ If someone asks me in the future about starting programming, I would recommend them the same to learn about the history adjacent to learning the skills. It’s making everything so much clearer
#[Ros]↩️ [KevinMarks] Oh this is _amazing_ New bookmark!
#capjamesgI have learned a lot about the web from thehistoryoftheweb.com. I'm glad there are people like Jay who are writing such intuitive historical summaries.
#[Ros]↩️ This site looks so special. I’m glad it exists. I’ll write to him to say thank you after I read this whole book
#[Ros]↩️ Or [Jay_Hoffmann] I should say thanks to you and [Jeremy_Keith] in advance. I’m so glad you’ve made this site. My friends are not going to see me the next wee while as I gorge this
#capjamesgAlso, I’d love your thoughts on the web extension / trust thing I shared earlier [tantek]!
#[tantek]Two things: 1) pretty sure Browser Add-ons can make arbitrary https requests (e.g. to update their lists of block sites, for ad blockers etc.),
#capjamesgAh, I had this made as a web page for testing. I didn’t realise web extensions had different permissions.
#[tantek]and 2) naming, I think "trust" is the wrong framing and frankly misleading. the absence of a warning does NOT (must NOT) imply any degree of "trust" of the contents of a site — it is far more likely that there is an absence of information about the site. Better to name exactly what it does, which is identify *some* sources of misinformation. A browser add-on would likely also provide a warning
#[tantek]so I said: "name exactly what it does, which is identify *some* sources of misinformation" which you could easily shorten to something simple / boring like
#capjamesgI have honestly been a bit hesitant to say disinformation because I know it’s a controversial thing. But that’s more of a personal barrier since I don’t usually deal with things of such controversy.
#[tantek]especially for programming libraries it is much better to ALWAYS name exactly what it does, and nothing more. no exaggerations, no flowery words
#[tantek]no it is not controversial, you are looking up consensus Categorization on Wikipedia
#[tantek]the assertion that disinformation is "controversial" is itself IMO a light form of misinformation
#[tantek]people who lie have marketed the idea that "disinformation is controversial" as an attempt to get more consideration of their lies
#[tantek]controversy implies equivalency, which there is none in disinformation "debates"
#capjamesgBut yeah, I’ll make the requisite naming updates. If you can come up with more concise copy for the intro in the README, feel free to submit a PR; otherwise, I’ll give it another go over the coming days.
#[tantek]capjamesg that article is tangentially about misinformation at best. it's more about whistleblowing, claims of funding contorting academic "honesty", and inconsistencies of apparent levels of academic freedom offered to employees of different positions (staff vs faculty)
#[tantek]capjamesg, sounds good. I may look at making a PR for the README on Monday if that's ok (remind me of the URL for the README?)