Cool idea.
Suggestion: Order of icons could match the name order: BB, GH, GL. (One of those: possibly nobody cares, yet it irked me immediately.)
Question: Why is BitBucket not turned on by default?
At home, I’ve been optimising my compression algorithm.
The encoder hasn’t received much attention yet, but I found a very nice trick for optimising one small part and it doubled the speed of the whole thing, putting it on par with LZ4. With tweaking it should beat LZ4 quite handily.
The decoder is where I’ve been focusing my efforts, and it destroys anything in the open source world. I’ve seen 60% faster with the same or a bit better compression than LZ4 on normal non-degenerate files, and I still have work to do. There’s definitely room for tweaking, and I want to try widening to AVX2. AVX is a bit derp though so I’m not sure if that will work out.
This is all particularly exciting because LZ4 has the world’s fastest practical LZ encoder (ruling out things like density) and I believe I’m in the same ballpark as the world’s fastest decoder (but with much less compression :<).
Next steps are to become #1 and figure out how to turn it into money.
I believe there’s a ‘documentary’ called Silicon Valley that can be a good starting guide.
I think they even had a real-world, case study toward the end involving life-like hands and cylindrical objects. Might be useful for bicycle games or something.
Good luck with your project!
I’m observing compression-related subjects from the sidelines occasionally. (Lately I started providing some moderately recent Windows binaries for some compression-related tools, that are not widely available.)
Are you perhaps unaware of encode.ru forum? You can look there for some projects that were based on LZ4, like LZ4X or LZ5, and a lot of other interesting stuff, like promising RAZOR - strong LZ-based archiver for instance. You’ll find there knowledgeable people from the field and their insightful posts.
Sort of. I’m trying to avoid taking them head on because I won’t win that. AFAIK they don’t put much effort into having super fast encoders, which leaves room for me to focus on end to end latency.
I haven’t used VLC for ages, and even back then I preferred MPC-HC (on Windows) or mplayer2 (on Linux). I’m happy mpv user (on Windows and Linux) since its very beginning.
Are there any users who use both mpv and VLC often, and could shed a light what VLC has that mpv cannot provide them?
Opening videos straight from the browsers “open with dialogue”. Using subtitles with VLC is more convenient for me. Few years ago I used VLC more when mpv had problems with some matroska containers, don’t know if this is problem anymore.
No GUI front-ends. I’m mostly keyboard-oriented user, but mpv’s built-in OSC is actually good too, so sometimes I operate with mouse in mpv window.
I’m not sure if I should feel embarrassed, but I wasn’t able to tell what language it is about without clicking link to “two posts ago” and later “the first post”, because only then first paragraph mentioned it’s Rust.
Now I see that there is rust tag here, but sometimes I visit stuff after reading title alone, i.e. w/o looking into tags. And Rust is not mentioned in the post even once. I don’t know, maybe it was a deliberate action to make people click those links at the beginning of the post?
Ok, I didn’t know that git 2.13 extended includes allowing conditional configuration. Nice stuff, but requires organizing specific directory layout for your repos, which may be already there, but not necessarily splitted between personal and work-related ones, so it may not suite everyone.
So far I was doing it by having same (personal) ~/.gitconfig everywhere and additional ~/.gitconfig.local, where my personal e-mail was overridden by corporate one. ~/.gitconfig includes ~/.gitconfig.local unconditionally, but I have a small script called gitcon, so I could easily change whether local version should be included (gitcon local 1) or not (gitcon local 0, which simply comments out include entry in ~/.gitconfig).
I’ve been relying on pinboard, but eventually want to use something with higher-fidelity archives. Googling revealed this ‘awesome list’: https://github.com/iipc/awesome-web-archiving
Other things I’ve archived in one-off scenarios:
My problem isn’t so much the act of archiving, it’s figuring out how to organize it all to consume it or aggregate it for searching later - I’d like to at least sit down for a few days to comb through all my pinboard tags and give it some better structure (which leads to another thing - I kinda wish pinboard supported tag autocomplete).
Thanks for sharing awesome list and your backup achievements.
I perfectly understand your worry for old out-of-prints books.
I agree also on how hard good organizing, i.e. useful for further processes on it, like mentioned consuming or indexing.
One of my main problems regarding videos I back up, is that I have no solution for properly tracking what I watched already. I watch stuff on various devices, mobile phone, laptop, etc. and because full channel backup can take a lot of space, I have to move most of it to disk, which isn’t on-line all the time, because my home server, from which I watch stuff, has limited storage.
Have you used plex? It sorta tries to track watching, at least to allow you to continue where you left off, and the mobile apps let you sync to watch offline. XBMC/kodi may also do this.
@przemoc Hi, I found this thread since you mentioned Shippable… I’m a co-founder :)
You can configure builds on Shippable without putting a shippable.yml at the root of the repository you want to build. While the config is still yml-based, it can live in a separate repository pretty easily. Please reach out to me at manisha@shippable.com and I can point you to a couple of examples of how to do this.
We’re also working on a UI based config which will be launched in a few weeks.
I already left similar question to the one I asked here on your page via Conversations box. It was a few days ago and apparently no one was around, so I was asked to provide my e-mail to get an answer later, but I haven’t got any yet. I suggest removing that kind of chat box if you don’t intend to support it.
Good to hear that there is some flexibility in Shippable, because I couldn’t find such information in your docs. It would be great if you could share details here. Or is there any specific reason you cannot write about it publicly?
Sorry for the late follow-up from our side to your question on chat. I believe you’re now in touch with our customer success team.
To clarify:
No problem, @manishas!
Indeed, I am in touch with one of your colleagues. And so far I am a bit disappointed.
Let’s visit:
There are lacking statuses, even in the latest one commit! And links from those available are wrong, as I get 404!
I archived it just in case:
For public repos such build job logs, that statuses link to, must be always accessible. They are useless otherwise. (For private repos they have to be accessible too, but not necessarily publicly to anyone for obvious reasons.)
The issue with broken links to build logs was apparently addressed by Shippable.
So I archived what we have right now in the sample project:
Links are different now, but as older ones gave me 404, new one gives me 404 too.
Previously they looked like this:
so pretty clear.
Now they look like this:
which is HUGE UX regression in naming department.
Why so cryptic links for public repos? They should be human-readable.
But most importantly they should be simply working, and be accessible for anyone if it’s public repository.
I’m honestly amazed that service’s basic features seemingly aren’t working properly.
As I said before, this is not the forum to discuss every single email we are exchanging offline.
If you try our mainline CI workflow, the links are human readable and publicly accessible. For example, here is a link to one of my projects : https://app.shippable.com/github/manishas/basic-node/runs/16/1/console
I want to again stress that your requirement isn’t a mainstream requirement. UI based config is clunky, un-versioned, and usually considered inferior. YAML based config is becoming the defacto standard, and even traditional tools like Jenkins are moving towards ci config as code.
I was trying to find a creative way make our platform work for your scenario, but maybe your needs are too specific. Good luck with your search!
@przemoc you’re right that build results for public projects should be public. That is EXACTLY the behavior when you put the shippable.yml at the root of your repo. Most customers want the config yml in the same repository since that keeps source code and build pipeline in the same place.
The workflow our customer success team suggested is an alternative workflow since you had different requirements and wanted to separate config from source code.
Anyway, since our customer success team is helping you over email, let’s take it there.
Overall, I am very disappointed in your communication style. Our customer success lead Ambarish specifically put together a sample for you and has been very responsive in clarifying all questions and issues, including the ones you raise above. I am not sure you have even forked the sample and tried it one time to see if it meets your needs. You should obviously find the best alternative that works for your scenario and needs, but some politeness in the process is always well appreciated.
Let me respond here to both of your comments in one.
Reply to 1st comment
you’re right that build results for public projects should be public. That is EXACTLY the behavior when you put the shippable.yml at the root of your repo.
So why the behavior isn’t the same when I put the file in other repo? In your first comment on my thread here you wrote:
You can configure builds on Shippable without putting a shippable.yml at the root of the repository you want to build. While the config is still yml-based, it can live in a separate repository pretty easily.
So can it live in a separate repository pretty easily or not? Apparently not, because it’s no longer functioning that well. Now I read that:
The workflow our customer success team suggested is an alternative workflow since you had different requirements and wanted to separate config from source code.
So why there was no information that build logs won’t be publicly visible from the very beginning?
Anyway, since our customer success team is helping you over email, let’s take it there.
I hoped so, but after reading your next paragraph, I couldn’t remain silent.
Overall, I am very disappointed in your communication style. Our customer success lead Ambarish specifically put together a sample for you and has been very responsive in clarifying all questions and issues, including the ones you raise above.
I appreciate that some sample was prepared specifically for me to show that what I wanted is possible with Shippable, but in fact it was not working, not even close. When you give someone sample you make sure it works. That’s the whole point of the sample. But let’s look back and see if what you wrote really holds value.
2 days ago, ~1.5h after getting first mail with the sample from customer success team (which was written to me in response to my question asked via conversation box few days before that), I replied that build links return 404.
The issue was completely skipped in response from Ambarish (which he sent 3 hours later).
Next day I replied mentioning another issue (which I didn’t catch before, because I was using smartphone when I was answering the first time, and mobile view on GitHub is limited): there are no commit statuses next to recent commits. And I repeated that I’m getting 404 errors when I try to see build logs from the commit statuses that are present.
First paragraph of response from Ambarish gave me some hope: “The links to the statuses are broken indeed! We are fixing this bug ASAP. “ 3.5h later there was a follow up message: “We have addressed the issue with the broken link. I had missed the showBuildStatus: true attribute on the gitRepo resource. The commit status links are correct now. You can fork the repo and try it out.”
Now I thought that maybe my first impression with the sample was wrong and Shippable actually seem to care after all. That’s nice. So I rechecked the repo. Finally there was a commit status next to the latest commit. I hovered over the tick icon and first thing I noticed was horrible and cryptic URL that I mentioned in my other comment already. So I clicked it. And guess what? 404.
So I replied mentioning that I am still getting 404 error when trying to visit build log. I also complained about needlessly cryptic URL for public repo, as they should be human-readable, and I reiterated: “But most importantly they should be simply working, and be accessible for anyone if it’s public repository.” Lastly, addressing forking invitation I simply stated: “There is no point in me forking the repo, if you cannot show that Shippable works even for some simple sample project.” That may sounded harsh indeed, but how can you react when someone tries to convince you that something works while it doesn’t really. Having build logs and looking into them are basic features in that kind of service for public repositories. At that time I was honestly amazed that they seemingly aren’t working properly.
In response Ambarish sent me “let me explain the whole story with the timeline”-mail. It sort of explained why commit statuses for older commits had wrong URLs. He ended mail saying that they will fix how URLs to build logs look like and make them more concise and gave me example: https://app.shippable.com/github/ambarish2012/jobs/basicnode_ci/builds/5a30c1fccf141c0700bf6cbe/console
There was one thing that was sorely lacking in his mail. The matter of 404 error I was still getting for build log for latest commit (e4fa8ae) was not even mentioned!
I got really irritated (who wouldn’t be at that point?). In the reply I wrote some of my paragraphs looked like this:
> 3. I then specified showBuildStatus and the status links started getting
> posted correctly.
If they're posted correctly now, why they're not working?
404 means PAGE NOT FOUND (I hope you know that common knowledge).
And I get 404 for clicking the status icon next to your latest commit (e4fa8ae).
I even provided page archives so it would be clear.
http://archive.is/RXXgC - commits page
http://archive.is/JjV6I - "build log" that redirects to 404
Have you checked it?
If the link to build log doesn't work, then it's useless as I wrote earlier.
> https://app.shippable.com/github/ambarish2012/jobs/basicnode_ci/builds/5a30c1fccf141c0700bf6cbe/console.
That will be an improvement, definitely. But why basicnode_ci?
It's true that's where config is placed, but you're building
basicnode, so it should be reflected in the URL.
At the end I'll repeat what I wrote earlier.
Please fix the links to build logs.
Even bad looking link is better than good looking, which returns 404.
So far all links are bad.
I'm on the verge of becoming indifferent at this point, because I'm
mentioning about broken links to build logs from a few mails back and
they are all still broken, including the latest ones supposedly
generated after fixing configuration. I don't know why you are unable
to fix it for that long time. It's core feature, isn't it?
Surely that wasn’t an exemplary mail of politeness, but I was barely keeping my composure in check considering how communication looked so far.
Today I finally got response:
I perfectly understand what a 404 is :). The reason the link is a 404
for you is because you are not a member of my organization in GitHub.
Basically, I have created a simple pipeline to build my repository and
to see my pipeline, you need to be a member of my organization.
We can get into the details why you need to be a member of my
organization to view my pipelines as well if you like.
I’m baffled. Why we’re talking about some organizations all of a sudden? You were building public repo, right?
So things are apparently much more complex in Shippable than supposed to be and that were presented at the beginning. Well, it could be fine. But!
And regarding this very particular mail strictly, there are questions that immediately pop out after reading it:
Does anyone reading my comment still believes that customer success team was “very responsive in clarifying all questions and issues, including the ones you raise above”?
I simply cannot agree with it. At the beginning I thought there is a genuine will to help me via showing how Shippable is capable of doing what I want. But with each further mail it seemed like it’s getting dragged for some reason (not in a strictly time-manner, but rather in a way how crucial information was (not) shared and how some of my issues were constantly skipped) and my main problem wasn’t addressed until the very last mail, and in fact only partially, as I am apparently supposed to ask one more time to be explained why it is like it is… And I mentioned so many times in my mails that build logs for public project must be available to anyone that it’s more than obvious that I would like to finally know these damn details, i.e. how is this sample application so different from “Most customers want the config yml in the same repository” case, that it cannot provide public build logs?
I started providing reports about Shippable in my thread here, because I hoped it would end up with success story - look, they reached me out and showed it can be done with Shippable. So everyone reading the thread in future would know that Shippable cares about their customers and delivers, even to non-paying ones wanting to build open-source projects there. But my hope was premature.
You say you are very disappointed in my communication style? I’m very disappointed that you sugar-coat how great communication looked from Shippable side.
I am not sure you have even forked the sample and tried it one time to see if it meets your needs.
I clearly stated that I didn’t fork it and I also explained why.
You should obviously find the best alternative that works for your scenario and needs, but some politeness in the process is always well appreciated.
I was polite, but I don’t deny I got irritated at one point.
Reply to 2nd comment:
As I said before, this is not the forum to discuss every single email we are exchanging offline.
That wasn’t my intention. I do believe, though, that open-source software and services for OSS need to be transparent and open, that’s what makes them best. You showed up in my thread stating that Shippable is up to my needs, so it was only natural that I’ll report back if it really is. I was only reporting here the issues I was facing so far.
But after seeing your 2 comments today, I decided I’ll share more details, to make the story more complete.
If you try our mainline CI workflow, the links are human readable and publicly accessible. For example, here is a link to one of my projects : https://app.shippable.com/github/manishas/basic-node/runs/16/1/console
But I was trying what customer success team prepared for me and that was supposed to match my needs. Talking about your mainline CI workflow here is a diversion.
I want to again stress that your requirement isn’t a mainstream requirement. UI based config is clunky, un-versioned, and usually considered inferior. YAML based config is becoming the defacto standard, and even traditional tools like Jenkins are moving towards ci config as code.
I understand that wanting to have configuration outside of main repo that is meant to be built is not a mainstream requirement. Otherwise I wouldn’t post this very question on lobste.rs regarding services that are able to do that.
And I have no clue why are you now talking about UI-based config. It’s another diversion from you. I prefer text-based configuration hands down, and YAML is fine for that. Have I criticized YAML configuration here or in any mail with Ambarish? I did not.
I just want the build preparation config for CI service to live outside of main repo, and if that may require using web UI (which actually doesn’t exclude use of YAML there and possibility of being versioned, right?), then I can still consider it despite my fully-text-based config preference.
I was trying to find a creative way make our platform work for your scenario, but maybe your needs are too specific. Good luck with your search!
Maybe my needs aren’t as specific as you paint and your platform would work for my scenario, but maybe you failed to communicate it properly, simply sell me on Shippable.
Maybe I should have accepted your proposal to mail you directly back then, because you seem to know the platform better, but back then I already left a question a few days earlier via your official site conversation box, so I believed I should go the standard way and wait for the reply, to have experience closer to standard prospective customer (who doesn’t know co-founder’s e-mail). I think it’s more fair that way. You were CCed by customer success team from the first mail anyway, so I guess that my experience was supposedly already meant to be better than most others can expect.
Not even once until today was stated, that supporting config out of the main repo is such a completely different workflow for Shippable, which has its own quirks like lack of publicly visible build logs, even for public repositories. I still don’t know why it is like that. Is it by design? What’s the rationale behind it? Maybe it could be fixed?
The whole discussion would go completely different if in the first mail from Shippable had proper sample, i.e. all needed information were included:
Then my reply would be simply:
Unfortunately it didn’t go that well.
I still don’t know answers to many of my questions. My curiosity slightly wants me to continue the mail conversation with Ambarish in hope that eventually they would be answered. My sanity, based on past experience and today @manishas’s comments, objects, though.
Can you make logs from pipeline jobs for public repositories publicly visible?
Can you make URLs to logs from pipeline jobs less cryptic and human-readable?
The runSh job is an Assembly Line job which is not automatically associated with a source code repository, which is why you also have to specify a gitRepo resource as an Input to the job. This is the main reason why I call it ‘not mainstream ci’, since the runSh job was designed to separate repositories from configured jobs.
As an example, you could potentially configure builds for 10 different projects in a single shippable.yml, along with provisioning jobs, deployment jobs, etc and configure a complete end to end workflow for your application.
I hope that clarifies why the build link cannot be tied to a separate project and is simply the job name.
Thank you for your answers, it clarifies some stuff. I really like the flexibility that you provide. I may still consider Shippable for use in the future. Hopefully the matter of publicly visible links for runSh job matters will be fixed by then.
@przemoc The only reason I reached out is because you mentioned us in the comment.
Since you’ve taken the trouble of explaining your stance, let me explain mine:
Your statements were pretty condescending to Ambarish and while I understand you were frustrated, he is too good an engineer to be talked to this way. I felt compelled to stand up for him.
At the end of the day, we got off on the wrong foot. This is the first time, in 4 years of running the service, where I have felt compelled to stand up for my team and just say that communication needs to remain polite and respectful. I am happy to continue the conversation over a call or over email, but I won’t be visiting this board to litigate this further.
Good luck and I hope you find a service that fits your needs.
Yesterday, before I sent my comment in response your other comment, where you wrote answers to my questions, I replied here. It was a longer comment (not as long as last time, thought), where I was addressing your points. But it’s not here, not sure what went wrong, and I don’t have the willpower to recreate it. I’ll only recreate what I wrote at the very end of it.
I apologize for all the words I’ve written that you think were rude, inappropriate, or condescending. I’ll apologize Ambarish personally tomorrow.
I still use jwz’s venerable youtubedown.pl script for archiving YouTube, even though I have youtube-dl installed for mpv. Still works with nary an update.
If a site is small or eclectic enough I’ll spider it with wget, but there’s been a few times where I’ve had to spend a few hours finding a single archive.org link to save. It’s on my to-do list to write automation for going through my pinboard XML to find broken/moved links, and to find forum posts: ltehacks.com went down a few months ago, and now that I’ve got a Calyx hotspot it would be useful to have those posts for reference.
I haven’t used youtubedown.pl ever, but it’s possibly not as configurable as youtube-dl, so I doubt it will change anytime soon. youtube-dl works fine, is quite well maintained, supports a lot of other sites beside YouTube, etc.
Ok, another Pinboard user. It’s like at least half of people commenting here use this service. Isn’t finding broken links already too late (unless you have archival account)? From what I read in other comments it seemed that Pinboard shows if link is no longer reachable, so why your own tool for that?
I have a grandfathered one-time account and no archiving service. Pinboard does not check the links nor add tags in that case, and I have over 20k bookmarks.
Sometimes the content has been moved slightly, (esp. if it’s an academic site, ie transitioning away from tilde-user directories) in which case I can usually manually find the content again. Some of them are also “read later” shortened Twitter/newspaper links from when I’m on my phone on the run, so a dead link in that case is “oh well, delete”.
My archives show long stretches of using Firefox’s Scrapbook Autosave, which would save every page I visited. But there are also multiple interruptions caused by the extension breaking, or by (the last time) Firefox breaking it.
The eBooks and CompSci papers I’ve saved are close to 50GB per my properties tab. . I’d like to have copies of certain web sites because they’re disappearing offline even in Wayback. It’s disturbing. I’m too overloaded with stuff to do to email all the admins I see setting up methods to keep them up. So, saving local copies might be the only thing to do.
You could possibly have a script that checks when they 404 from the source site and makes them publicly available on your own hosting with a message about providing it under fair use and it isn’t available anywhere else, email me to have it taken down…
I never thought about mailing admins of sites, but usually you have no direct contact to them.
Asking them for some archive is possibly the easiest way to back things up. I guess it could work for some smaller services, if admins were open for cooperation in that regard, obviously.
Thanks for this simple idea. Sometimes obvious solutions are overlooked.
Usually the author’s email is on their academic page. The old school sites often have a webmaster email on the bottom. Some work.
Most of the code I write goes on GitHub, and then is usually cloned to my home desktop and my laptop. I’ve been meaning to upload everything to my Bitbucket account, but haven’t yet.
I keep my photos on an external hard drive, which I backup to another external drive every so often (which I should probably do sometime soon). The ‘good ones’ get uploaded to SmugMug and often 500px.
I personally don’t see the point in hoarding content from the web. I use Pinboard to bookmark interesting content when I find it, and I’ll download PDFs for offline viewing, but that’s generally as far as I go.
If you want to have multiple git repository mirrors, just in case, then I would consider also those known to be reliable, but not necessarily that well-known or not providing as many UX features people know from GitHub or BitBucket:
Nice point about them is that their framework is open-source, so you can host them on your own, if you want.
Thanks for mentioning SmugMug and remembering me about 500px - I totally forgot about it.
Another Pinboard user. I wonder how I haven’t heard about it so far if it’s so popular?
There is also project called sotoki that seemingly allows bringing StackExchange sites to Kiwix by converting their dumps into zim files.
apparently the ipfs folk have wikipedia up as something you can mirror through ipfs but they’re working on a dynamic version that doesn’t need to be manually uploaded which would be amazing
I’m considering paying Pinboard for their web archiving feature, but so far it’s not been a huge pain point.
I use Pinboard’s archiving, just for articles I’ve read and other things where I’d only be mildly annoyed if I lost them, it’s a bit too unreliable for anything else. The archiving time is sporadic, some things get archived in a couple of hours, others can take weeks, and many of my bookmarks say they’re archived but trying to open the archived page just causes an error.
I still use it because it’s the only one I’ve found that will archive PDFs and direct links to images. Well, that, and because I paid 5 years in advance.
Thanks for the review. It’s sad they don’t do the archiving at the moment of bookmarking. That’s what I feel is the best approach, but maybe they have so many users that reaching front of the queue takes week or so?
Considering how you don’t think that good of Pinboard, I’m wondering why you went with buying 5-year service from the beginning.
I already had a standard pinboard account grandfathered in from when it was a one-off fee, when I upgraded to an archiving account, and I had been happy enough with that. My thought process was I’d pay in advance and then I would have everything archived and I wouldn’t have to worry about it again for 5 years, I didn’t consider that it would turn out to be less reliable than I’d like.
I pay for it and use it – my only regret is activating it so late, after having added bookmarks for years – that meant many many bookmarks had already vanished. (Thankfully Pinboard lists all such errors and the specific HTTP code that caused it)
I like that they provide all error and HTTP codes. Are there logs too, so you can actually tell when the page stopped being reachable?
No, just the error and an option to manually trigger a retry.
It’s added as a machine tag like code:403
I joined Pinboard almost exactly 7 years ago and it has already saved my butt a bunch of times. According to my profile page, about 5% of my bookmarks are dead links at this point.
It has to be reassuring. Well, they’re not only proving fun statistics, but they’re proving their value to you. I really haven’t heard about Pinboard until today. If there would be a local client for syncinc the archived content locally, then I could consider buying the service and using it, but first I would need to restore my habit of bookmarking that I somehow lost many years ago.
Interesting. I guess some bookmark-like service on top of archive.is / web.archive.org could be created. Or maybe there is even already such thing for free.
I’ve been running a backup of my shaarli bookmarks but neglecting it over the last months. Sadly a few of the bookmarks were already offline or require flash player or other silly stuff.
I was working at a project to replace the weird python tool I was using for this since it didn’t screenshot the entire page, only a 1920x1080 section of it (or some other screen setting). I might redo it considering firefox now offers the option to screenshot a page at full height.
I also archive a lot of stuff on my nextcloud folder, I have some arxiv on there. The biggest part might be my image collection of memes and fandom art material (some fanfiction and other artwork sites are also backed up), which is totalling about 47.5GB of data.
I haven’t heard about Shaarli before. Thanks for mentioning!
Ideally any on-line bookmarking service should archive the current version of bookmarked page along with the URL. Local bookmarking apps should provide similar dump locally, but if user would be okay to sync his data with service’s server (*), then request for a snapshot should be performed on-line too, e.g. using mentioned earlier archive.is and/or web.archive.org, so you would have off-line backup and on-line backup.
(*) it never should be enforced, I am fed up with mobile apps wanting to keep all my data in the cloud - it may be fine for some stuff, but not necessarily for everything
Shaarli has a plugin to open a archive.org page but it’s sadly not automatic.
Personally I don’t think bothering archive.is and archive.org about this would be misplaced, especially if you backup a lot (I have about 2GB worth of site data atm, it’s probably double that by now), I did have the entire thing online somewhere though…
Whenever I come across a PDF I always save a copy before opening it, since they’re very cheap to store and often useful to refer to. I try to add them to a large Bibtex file too, but that takes more effort so I’ve only gotten through a fraction of them.
I have a directory of “talks” for lectures, presentations, etc. which I’ve found interesting enough to keep (mostly saved from YouTube). I only move stuff in there after watching, and deciding whether or not I might ever watch it again. I have a subdirectory for TED talks, since (a) I’ve saved loads and (b) their short/shallow nature is rather different than the long/deep nature of a lecture.
I don’t trust sites like Github with hosting code; they’re useful as mirrors, but I keep clones of all the software I’ve ever written, along with all the projects I’ve ever bothered checking out or cloning (outside of build scripts).
Yeah, I also save most of my PDFs. I don’t save them mostly by accident, as I can skim it quickly in browser and forget to hit the download button. But I don’t organize them immediately after downloading, sadly, so I tend to have a lot of them to the point it becomes a pain and I start organizing them, but usually fail to do so to the full extent.
Some conferences used to provide content on their FTP servers, or HTTP servers with file index, and in such cases lftp is invaluable, because it can non only work with FTP, obviously, but even with file index on HTTP, which is not a widely known feature. So yeah, you can do a mirror of file server served over HTTP using it. I find wget with its recursive features a bit more clunky here. With lftp I can for instance estimate needed free space by invoking du -hs command, and it will visit each subdirectory to do the calculation. Sometimes some non-standard file index are used, so it may not always work, but it’s still a great feature.
GitHub is not better or worse than most other code hosting solutions (if we’re talking about sole repository hosting that you access using your favorite tools remotely). But as long as a decentralized VCS is used under the hood, you can easily fully clone it (git clone --mirror and similar solutions), which is a great thing. But I very rarely mirror repositories and main branch is usually good enough.
I am also wondering if there would be some value in creating site, where people could state what they back up, so in case of some content going down, you would know where to look for help and file request for reshare.
The potential problem with this approach is that people can download videos in different ways (and for good reasons). Some may get them using youtube-dl with options like I’ve shown in my post’s text (to be more mobile-friendly), but other may go with default best, so they’ll get various results (e.g. VP9 instead of H264), and even if options are the same, but you combine audio and video, results can slightly vary because of different ffmpeg versions used and so on. Mere BitTorrent isn’t good enough, because file hashes will be too easily different. There would be need for some video container and codec aware P2P, where checksumming would be at video data level and not whole file level, where stuff can differ even if the actual video or audio bitstream is the same. But as I wrote, even this could sort of work only in case of videos downloaded with same settings, because if bitstreams differ, you won’t convert one from the other.
Thanks for mentioning dat protocol. I heard about IPFS, but somehow never about dat (or I simply forgot).
It could be kind of that, I guess (please check the other comment here and my reply), but it’s not what I had in mind.
I was thinking more about metadata-sharing service than data-sharing service. What I mean by that is that the service itself wouldn’t provide any way of sending files, videos, etc. It would only allow people simply advertise what are they backing up from internet (possibly how and also how many stuff they presently have), obviously in some very organized way (e.g. YouTube channel, YouTube playlist, ftp mirror, etc.), so others could search for same thing if in need. This advertising should be done in repeated manner, so some kind of tracking app could be needed, possibly with many plugins (youtube-dl download archive reader, etc.).
If what you search for is already in the database (i.e. there are some users advertising they’re backing it up), then you would be able to send message to users having it to figure out how you could obtain the content from them. I simply wouldn’t concentrate on providing concrete sharing solutions, as I think it wouldn’t be a point here.
I tend towards archiving, sorting and catalouging all my data. So I have full site rips of several science fiction blogs as well as textfiles.com, Bruce Scheiner’s blog, and quite a few others. Along with:
UC Berkely’s online Video lectures All of C3 All of Blackhat All of Defcon Several TB of full tv series Several thousand movies Twenty thousand plus books Full archives of byte, mondo, 2600 magazine, etc. And much, much more.
Admirable dedication.
I have to add that I hate when there are online video courses I bought that I cannot easily download, like from Thinkific. I know I can watch it anytime I want, but what if the site will go down, or the creator closes his account with all the courses she or he was selling? I don’t like it (just like I don’t like subscription model in software, because if I buy something, I want to have it accessible perpetually and obviously offline too).
I know it’s to prevent piracy, but typical thing with anti-piracy protections is that they make lives of users harder, while pirates will somehow grab the content anyway if they’ll be really willing to do it.
I am trying to minimize manually checking websites for updates, so I just download everything and look at the list in a text editor. Also, I try to generally increase the fraction of things online that I read by downloading, converting to text using document.documentElement.innerText (with some minor extra scripts to put hyperlink targets inside the text), and opening the result in an editor. Of course I don’t bother to delete either HTML or text afterwards (and I — well, my scripts — do record source URLs).
Well, there are multiple things.
I consider most of the web design actively harmful, as in: a text dump has minor inconveniences, but the site as designed is usually even less readable. Comment threads are sometimes an exception, and in case of comment threads Elinks is usually better than innerText (but it has other drawbacks; maybe I should find a way to combine best of both worlds in some way).
I want to have tools that gradually reduce the attack surface of web browsing. Grab-then-read workflow (and once Firefox instance exits, all processes of the corresponding user are killed) will hopefully let me to gradually increase the sandboxing.
This workflow means that if I save something, I actually see what I have saved.
Most of the sites see almost-normal Firefox visits; and I do have an option to apply the grabbing script to something opened in an interactive Firefox instance (which is still UID-isolated, network-namespaced etc.), which might be in a state that is hard to obtain automatically (for example, some subset of threads is loaded to a greater depth).
I’m working towards something similar, except as much as possible I want to send the resulting data either to my printer (or, I might get an e-reader for christmas?), as a batch job every morning. I was planning on using a dockerized chrome that I found somewhere. How are you automating Firefox to do this? Selenium? The print-to-pdf seems to be missing from the Selenium API, so I might have to use another tool to get my pdfs.
No, I cut out the middle man. I just use Marionette and the official Marionette Python client from Mozilla. Which is used to execute Javascript code sometimes generated by bash scripts, but oh well. I also use networking namespaces to allow each instance to have port 2828 for Marionette.
Marionette allows execution of Javascript code in the context of Firefox UI. For example, (the code is lifted from Firefox tests, which are the main use of Marionette) Components.classes["@mozilla.org/gfx/printsettings-service;1"].getService(Components.interfaces.nsIPrintSettingsService) seems to evaluate to an instance of nsIPrintSettingsService. Hopefully some browsing through XUL reference could give you a solution for printing in the current Firefox release; no guarantees when something will change…
Another option is to run Firefox in its own (virtual) X session, run window.print() then find the print dialog and send it the needed input events.
Are your scripts available somewhere? Does a write-up of your method exist? I’d be a huge fan of using that.
A separate problem is that I need to create/delete a ton of users, and that requires root access, and my current permission-check code for that is a part of a Lisp project where I use sinit as PID 1 and most of the system management stuff is performed inside an SBCL process.
I hope to clean up and write up that Lisp part at some point…
Are you interested enough to participate in cleanup of the part relevant to your interests (and probably in implementation of an alternative UID-spawning backend, as you are interested only in the Firefox part)?
I’m interested, sure, but I can’t say in all honesty that I’d have enough time to inject significant effort in the project. Is the code already on a public repository somewhere, or is that in the future too? I’d rather not promise anything, but I really would like an opportunity to touch some Lisp code.
Well, there are too many assumptions to just put it in the public repository and hope anyone could reproduce it (some parts assume my Lisp daemon, some parts assume Nix — the package manager — is available, various parts use my scripts initially written for other reasons, etc.) Have I mentioned I feel significantly less comfortable when more than one assumption is broken on something I use as a laptop/desktop?
I could set up a repository for that, put a layer by layer there and ask you to check if simple tests work in your environment (for each layer). Then at some point the simple test will be «please check if it correctly downloads 20 entries from Schneier’s blog starting with {URL}». I am not asking you to write much code for that, but I need some feedback (and some positive reinforcement) to undertake it.
If you are willing to skim and run as root a trimmed-down version of my Common Lisp system-management daemon (you don’t run as root code provided by conspiciously pseudonymous strangers on the Web without skimming the code first, right?), even I would need just to separate the relevant parts of my setup without writing much new code.
In any case I plan to eventually publish some rewritten version of all that; hopefully in February 2018, as a write-up to submit to European Lisp Symposium (this would hopefully be about Lisp-for-system-policy, and controlling Firefox would be one of the features).
I think you underestimate the value of reading code even without the ability to run it.
But deciding to publish it at any date is generous of you, so don’t read this as me pressuring you to up your schedule :)
OK, I tried to look what can be pushed as-is. But even cleaning up the first step (network namespace wrapper script, with passthru done via socat) turns out to be already not completely trivial… It still leaves socat behind from time to time (doesn’t matter as much for my specific case where lack-of-persistency incentivizes me to run a reaper anyway, but obviously should be cleaned up, and I failed to do it cheaply)
I understand where are you coming from. Thanks for sharing your approach, which I guess is most likely unpopular.
Related and useful links: