2013-05-25 UTC
tantek joined the channel
# 00:15 aaronpk holy cow, I've already downloaded 63 megs of HTML from external pages in my reply context code
# 00:16 tantek aaronpk - perhaps we could add cheap archiving to business-models :)
# 00:16 tantek or maybe we should just shove archive HTML pages into static github pages
# 00:17 tantek wonders if that can be done programmatically with a reasonable URL structure
# 00:18 tantek so what's a short github user name we can create just for this sake
# 00:18 Loqi !calc (and encourage people to mirror
# 00:18 tantek we could all share the same github user/community
# 00:18 tantek and since we keep snapshots by datetime index… we won't collide
# 00:20 tantek plus I'd suggest, ahem, NewBase60 encoding of epoch days + t + NewBase60 seconds into the day
# 00:20 aaronpk that tends to be harder to handle for people though, cause it's not built in to every language's date library
# 00:21 aaronpk hm there is a ruby gem. we should get a composer package up for PHP
# 00:21 aaronpk you kind of lose the readability of the URLs though
# 00:22 tantek aaronpk - I think there is a composer package setup for CASSIS.js which has the PHP implementation :)
# 00:22 tantek I used to think they were readable like 10 years ago
# 00:23 aaronpk so... github.com/webarchive? would be webarchive.github.io/
# 00:23 tantek aside: appears if you force-quit iTunes it will no longer see iPods until you restart?
# 00:24 tantek people are squatting the single letter github accounts
# 00:32 tantek plus we place no copyright claims and state it is purely for library/archive purposes only, and anyone is welcome to clone (not modify) and keep the same terms
# 00:32 tantek once we get it going, we could even likely talk archive.org into maintaining a mirror of it
scor joined the channel
# 00:33 tantek this is the kind of stuff that you end up figuring out / building when you scratch your own itches and follow them to their logical conclusions
# 00:33 tantek I don't think anyone in FSW circles came up with the idea of a distributed collaborative mirroring of posts that they replied to!
# 00:34 tantek performance shouldn't be an issue either - as rarely should we need to reparse the HTML
# 00:34 aaronpk everyone was so focused on building a full stack that did everything
# 00:34 tantek instead of distributed systems that cooperatively grew
# 00:35 tantek I think I may still store parsed JSON bits for speed of retrieval / display on my server, but then keep a URL to the copy of the HTML on i-a
# 00:39 aaronpk wondering how long it takes github to build the gh-pages site or if I did this right
# 00:40 aaronpk "Changes may take up to ten minutes to be visible." oh
# 00:47 tantek archiving the web that the indieweb deems worth linking to
# 00:47 tantek wouldn't surprise me if someone asks for a feed of all the URLs
# 00:48 tantek we build up the write-access contributors through social web of trust from the initial seed of indieweb camp attendees
# 00:48 tantek that have their own indieweb sites setup - whatever they used to sign-into indiewebcamp.com
# 01:01 tantek ok well that was an unexpected brainstorm for end of the week / Friday afternoon when we should be mentally burned out and stuff
# 01:02 tantek aaronpk - wouldn't have happened without you venting about your itch, and the idea popping into my head as a result. I don't think either of us would have thought of this purely independently, at least not for a while
# 01:03 aaronpk not bad, less than an hour from itch to scratch to full wiki page with docs and a simple live site
# 01:07 tantek so I can just paste in functions like ymd_to_sd() and execute them
# 01:08 tantek e.g. 1. go tantek.com, 2. click JSEval, 3. type in CASSIS function to run it and get a value
# 01:08 aaronpk i just open up the chrome console (command+option+j)
# 01:09 tantek I just wanted a *really* simple little alert box
spinnerin joined the channel
# 01:26 tantek aaronpk - note that another side effect of the modified URL structure is that you can look at what was archived for a particular day
# 01:28 aaronpk yes, and the nice thing about that is the root folder will only ever have as many folders as there are days archived
# 01:28 aaronpk rather than the archive.org where each full second-precision date is its own folder
# 01:30 tantek yeah - might even be manageable in a "normal" local laptop filesystem for browsing
# 01:30 aaronpk using the same format p3k uses to store right now, the subfolders will also be neatly organized since the only thing in the XXX/YYY folder would be the domain name
# 01:32 tantek after an initial dump, e.g. with your and Barnaby's HTML archives of your reply-to originals, we can even lock down old directories
# 01:32 tantek to prevent unintentional modifications to the past
# 01:32 tantek or maybe we only allow new submissions to the past day and into the future.
# 01:33 tantek that's a reasonable time stamp trust granularity - we can also check commit log timestamps vs. claimed directory timestamps to determine archiving lag
# 01:34 tantek oh my goodness I need to get on my bike and get home before I keep coming up with more on this - I'll think on the bike ride home ;)
# 01:34 tantek always a pleasure collaborating with you aaronpk
tantek and mxuribe joined the channel
# 02:54 aaronpk relating to access control for IndieArchive... the fact that it's on Github will be really convenient
# 02:55 aaronpk anybody is able to fork the project right now, and can always be adding files to their own branch
# 02:55 aaronpk after they have a number of successful commits, and we see that their site behaves according to the project guidelines we set up, we can begin accepting pull requests
# 02:56 tantek we still have to solve the HTTP headers + HTML file problem
# 02:56 aaronpk we can store headers in a .headers file or something
# 02:56 tantek especially if we expect to donate this to archive.org longer term
# 02:57 aaronpk so there'd be two files on disk... 3_C/3Kj/aaronparecki.com/index.html and 3_C/3Kj/aaronparecki.com/index.html.headers
# 02:58 tantek that seems reasonable - I don't have any better ideas
# 02:58 aaronpk marked as a hidden file, so it doesn't show up in the file browser UIs
# 02:59 aaronpk or, outside the main folder structure entirely: headers/3_C/3Kj/aaronparecki.com/index.html
# 03:00 aaronpk at least that way we know filenames will never conflict
# 03:01 tantek that's why they came up with the whole .well-known nonsense
mxuribe joined the channel
# 03:15 tantek ideally I'd prefer to keep all such "meta" stuff as close to the real thing as possible
# 03:16 aaronpk trying to minimize the chance of conflicting filenames, .headers.[filename] seems best
# 03:18 tantek (i'm searching for how to archive a web page with headers :) )
# 03:18 tantek so how does one view the raw http headers of something on archive.org?
# 03:24 tantek so google is good at search, bad at being a content silo
# 03:25 tantek well there's that if we want to go crazy with archiving the entire http request and response
# 03:26 tantek which I'd rather not - seems like overkill, and wrong for a distributed shared project
# 03:27 tantek what if we just keep it dumb/simple, make the requests always just be a simple "GET" - no other weird params, no cookies
# 03:27 tantek and the response just has the entire raw HTTP headers - just as with the HTML, we're saving the raw stuff in case we want to reparse it later
# 03:28 tantek the HAR format seems to imply some amount of HTTP header processing into JSON - which is antithetical to the methodology of archiving
# 03:29 tantek ok, based on a quick read of that (over-designed) spec, I think saving the raw http headers as a text file is fine for us
# 03:29 tantek and the .headers.[filename] convention is fine too.
# 03:30 tantek I doubt most (if any) of the folders will contain more than just one file and its headers
mxuribe1, duckbillp and tantek joined the channel
# 06:41 aaronpk I thought about saving the whole HTTP response too, but it seemed like it would just make it that much harder to deal with later. for example you wouldn't be able to open it in a browser
andreypopp joined the channel
andreypopp and eschnou joined the channel
laurian, andreypopp, xtof, peck_lx, barnabywalters and duckbillp joined the channel
# 16:02 tantek aaronpk - IndieAuth.com copy/links need updating - they refer to 2012 ;)
duckbillp joined the channel
# 16:39 aaronpk tantek: oh I meant saving the entire HTTP response (headers\n\nbody) in the file
# 16:54 Loqi aaronpk: Update indiewebcamp links to generic instead of 2012. Add app.net logo to supported provider list
barnabywalters, eschnou, scor, Nadreck and andreypopp joined the channel
eschnou, andreypopp, tantek, jalbertbowdenii and danbri joined the channel