capjamesgRegarding the classification, I think the best approach would be to have a pre-classified list of Wikipedia categories so the browser can do a lookup.
capjamesgYou can technically do classification on the fly with a library like transformers.js but it is wasteful to load a model and compute vectors in the browser when all you need is to know if a Wikipedia category is positive or negative.
capjamesgI think a classification model is ideal for a first pass of the Wikipedia categories at a low threshold, depending on how many there are. Then human review of anything below a certain threshold.
capjamesgUsing the classification model I'm using right now -- a public model fine-tuned on DistillBERT -- I am getting good results with a 80% confidence threshold.
capjamesgI'd envision a browser extension to have a Red, Green, or Grey icon that shows depending on whether a flag has or has not been raised, or if no information is available about a page.
capjamesgThis implements our algorithm -- minus the n-day consensus -- with lookups on Wikipedia's reputable lists, and, potentially, third-party, authoritative lists of fake news websites.
[tantek]I don't trust "third-party, authoritative lists of fake news websites" to be "authoritative" — because the process by which it is done is not open
capjamesgActually, correction: "We are very selective in who we allow access to our API. Please fill out the form below to receive permission or a quote to use our API."
capjamesgGo make dinner. I'll sleep. But I'd love to explore this further! I may turn my Python code into a package so it could be integrated into web app back-ends. That's a task for another day!
[Al_Abut][aciccarello] and @btrem - I’m late to the discussion but yes, you can skip the <picture> element and just use <img> while serving up different sizes. Just use the <img> tag like usual and use srcset with different widths.
capjamesg[tantek] I have added consensus logic to the Python implementation. You can choose between four strategies: percentage, majority, unanimous, or in one or more days.
[dmitshur]I've realized recently that git itself doesn't seem to enforce the `user.email` value, it being an email is a convention but otherwise it can be set to a URL. I'm curious if anyone has considered taking advantage of that and actually use a URL (instead of email) as theit git author identity, for the same reasons that are motivated at https://indieweb.org/Why_web_sign-in#Why_not_email.
[dmitshur]The reason I'm thinking about this now is because I'm working on letting people send changes to the git server on my site, and I authenticate and identify them via a URL. I don't track user's emails (and don't want to track them), so if git commits use email addresses, it's harder to associate it with a user. If git commits happened to use URLs, it'd be trivial for me to automatically verify that a person who authenticated as
[dmitshur]I guess most people would not want to use a different git author identity only in some contexts, and email has the benefit of being the widely accepted default that'll work everywhere without any surprises (GitHub, Gerrit, GitLab, random people's personal websites, etc.). So someone would have to be very very motivated to consider switching away from email as their preferred git author identity.
[Ros]E.g. I just went https://youtu.be/NzzGt7EmXVw?si=W1DZY1KH3n-mTcB0 that covers why and how HTML is formatted as it is today ([Jeremy_Keith]’s book is mentioned in episode 2), and I’m finding it so helpful to understand the background. So many “like duhhhh” moments hitting for why we do things as we do. I’m looking for more good episodes on HTML, CSS + Javascript if anyone can recommend 🙏
[Ros]↩️ If someone asks me in the future about starting programming, I would recommend them the same to learn about the history adjacent to learning the skills. It’s making everything so much clearer
capjamesgI have learned a lot about the web from thehistoryoftheweb.com. I'm glad there are people like Jay who are writing such intuitive historical summaries.
[Ros]↩️ Or [Jay_Hoffmann] I should say thanks to you and [Jeremy_Keith] in advance. I’m so glad you’ve made this site. My friends are not going to see me the next wee while as I gorge this
[tantek]Two things: 1) pretty sure Browser Add-ons can make arbitrary https requests (e.g. to update their lists of block sites, for ad blockers etc.),
[tantek]and 2) naming, I think "trust" is the wrong framing and frankly misleading. the absence of a warning does NOT (must NOT) imply any degree of "trust" of the contents of a site — it is far more likely that there is an absence of information about the site. Better to name exactly what it does, which is identify *some* sources of misinformation. A browser add-on would likely also provide a warning
[tantek]so I said: "name exactly what it does, which is identify *some* sources of misinformation" which you could easily shorten to something simple / boring like
capjamesgI have honestly been a bit hesitant to say disinformation because I know it’s a controversial thing. But that’s more of a personal barrier since I don’t usually deal with things of such controversy.
[tantek]especially for programming libraries it is much better to ALWAYS name exactly what it does, and nothing more. no exaggerations, no flowery words
capjamesgBut yeah, I’ll make the requisite naming updates. If you can come up with more concise copy for the intro in the README, feel free to submit a PR; otherwise, I’ll give it another go over the coming days.
[tantek]capjamesg that article is tangentially about misinformation at best. it's more about whistleblowing, claims of funding contorting academic "honesty", and inconsistencies of apparent levels of academic freedom offered to employees of different positions (staff vs faculty)