#tantekplus we place no copyright claims and state it is purely for library/archive purposes only, and anyone is welcome to clone (not modify) and keep the same terms
#tantek.comedited /IndieArchive (+169) "/* Thoughts */ inception was a result of one person's itch, that another thought of a scratch for" (view diff)
#aaronparecki.comedited /IndieArchive (+0) "/* URL design */ update archive.org examples to a timestamp when my site had more stuff on it since they are clickable links" (view diff)
#tantekaaronpk - wouldn't have happened without you venting about your itch, and the idea popping into my head as a result. I don't think either of us would have thought of this purely independently, at least not for a while
#aaronpkusing the same format p3k uses to store right now, the subfolders will also be neatly organized since the only thing in the XXX/YYY folder would be the domain name
#tantekthat's a reasonable time stamp trust granularity - we can also check commit log timestamps vs. claimed directory timestamps to determine archiving lag
#aaronpkrelating to access control for IndieArchive... the fact that it's on Github will be really convenient
#aaronpkanybody is able to fork the project right now, and can always be adding files to their own branch
#aaronpkafter they have a number of successful commits, and we see that their site behaves according to the project guidelines we set up, we can begin accepting pull requests
#tantekwell there's that if we want to go crazy with archiving the entire http request and response
#tantekwhich I'd rather not - seems like overkill, and wrong for a distributed shared project
#@alastairtouwOn the one hand, there’s all this great writing on @medium. On the other hand, it’s all on @medium. #indieweb
#tantekwhat if we just keep it dumb/simple, make the requests always just be a simple "GET" - no other weird params, no cookies
#tantekand the response just has the entire raw HTTP headers - just as with the HTML, we're saving the raw stuff in case we want to reparse it later
#tantekthe HAR format seems to imply some amount of HTTP header processing into JSON - which is antithetical to the methodology of archiving
#tantekok, based on a quick read of that (over-designed) spec, I think saving the raw http headers as a text file is fine for us
#tantekand the .headers.[filename] convention is fine too.
#tantekI doubt most (if any) of the folders will contain more than just one file and its headers
mxuribe1, duckbillp and tantek joined the channel
#aaronpkI thought about saving the whole HTTP response too, but it seemed like it would just make it that much harder to deal with later. for example you wouldn't be able to open it in a browser