Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
28th October 2020, 19:53 | #1 | Link |
Registered User
Join Date: Aug 2019
Posts: 14
|
Need help downloading stream only video from archive.org with youtube-dl
I am able to download some stream only videos from archive.org with youtube-dl with no problems. For example this downloads all the separate 1 minute .mp4 videos and combines them when finished - https://archive.org/details/WJLA_200..._Peek_Special/
But another stream only that does not work is this video - https://archive.org/details/MSNBCW_2...ch_a_Predator/ The log says when trying to download "ERROR: unable to download video data: HTTP Error 403: Forbidden". It does not say that when downloading the first video link. What am I doing wrong? Why does the first stream only video link I can download with no problems, but the second link says its "Forbidden" How can I download? Is there another program that can download all stream only archive.org videos? Please and thanks for any help. |
2nd May 2021, 02:17 | #2 | Link |
Registered User
Join Date: Nov 2005
Posts: 693
|
In the HTML-source in a "meta"-node you can find...
Code:
<meta property="og:video" content="https://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4"> What about youtube-dl? Code:
youtube-dl.exe -F "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" [archive.org] MSNBCW_20131125_040000_To_Catch_a_Predator: Downloading webpage [archive.org] MSNBCW_20131125_040000_To_Catch_a_Predator: Downloading JSON metadata [info] Available formats for MSNBCW_20131125_040000_To_Catch_a_Predator: format code extension resolution note 0 mp4 640x360 youtube-dl.exe -g "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" https://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?exact=1&start=0&end=120 youtube-dl scrapes the "embed"-variant of this website, which only has the first 2 minute segment of this video in the HTML-source. The initial url you provided does have all the cut-up-segment urls in its HTML-source. These urls work, but you'll need a HTML/JSON parser to extract them. Luckily xidel is a command-line tool that can do just that. I'm going to assume you're on Windows btw. The value of the "value"-attribute in this "input"-node... Code:
<input class="js-tv3-init" type="hidden" value='{...}'/> Code:
xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" -e "//input[@class='js-tv3-init']/@value" {"TV3.identifier":"MSNBCW_20131125_040000_To_Catch_a_Predator",[...]"TV3.aspectratio":1.7777777777778} Code:
xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" -e "parse-json(//input[@class='js-tv3-init']/@value)" { "TV3.identifier": "MSNBCW_20131125_040000_To_Catch_a_Predator", "TV3.embedable": 0, "TV3.ccnums": [".cc5", ".align", ".cc1", ".cc5"], "TV3.ignore_me": 61, "TV3.CLIP_SEC_MAX2": 60, "TV3.CLIP_SEC_MAX3": 180, "TV3.TVNRT": 0, "TV3.thumbzillas": [ "000001", [...] "003646" ], "TV3.clipstream_clips": [ "http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=0/60&ignore=x.mp4", [...] "http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=3600/3660&ignore=x.mp4" ], "TV3.quotes": [], "TV3.duration": "3660.66", "TV3.aspectratio": 1.7777777777778 } Code:
xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" ^ -e "parse-json(//input[@class='js-tv3-init']/@value)/(TV3.clipstream_clips)()" http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=0/60&ignore=x.mp4 [...] http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=3600/3660&ignore=x.mp4 Code:
xidel.exe "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" ^ -f "parse-json(//input[@class='js-tv3-init']/@value)/(TV3.clipstream_clips)()" ^ --download "{request-decode($url)/concat(extract(path,'.+/(.+)\.',1),'_',params/end div 60,'.mp4')}" Code:
MSNBCW_20131125_040000_To_Catch_a_Predator_1.mp4 [...] MSNBCW_20131125_040000_To_Catch_a_Predator_61.mp4 Code:
xidel.exe -s --xquery ^"^ for $x at $i in parse-json(^ doc('https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator')//input[@class='js-tv3-init']/@value^ )/(TV3.clipstream_clips)()^ return^ x:request({'url':$x})/file:write-binary(^ concat(extract(url,'.+/(.+)\.',1),'_',$i,'.mp4'),^ string-to-base64Binary(raw)^ )^ " Code:
FOR /F "delims=" %A IN (' xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" -e "parse-json(//input[@class='js-tv3-init']/@value)/(TV3.clipstream_clips)()" ') DO @curl.exe [options] "%A" Code:
(FOR %A IN ("MSNBCW_20131125_040000_To_Catch_a_Predator_*.mp4") DO @ECHO file '%A') > mylist.txt Code:
ffmpeg.exe -f concat -safe 0 -i "mylist.txt" -c copy "MSNBCW_20131125_040000_To_Catch_a_Predator.mp4" As far as I can tell, there's no way to completely solve it, but I did find a way to make it less intrusive. Sadly appending "?t=0/3661&ignore=x.mp4" doesn't work, but the extracted JSON does give away the maximum allowed segment length; the key "TV3.CLIP_SEC_MAX3" its value 180. Appending "?t=0/180&ignore=x.mp4" does appear to work. I suggest you decrease that value by 1 second however for a smoother transition. So with a segment length of 180 second (instead of 60) there will be 3 times less hiccups and by creating the start- and end numbers yourself the remaining transitions will be smoother. Code:
xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" ^ -e "parse-json(//input[@class='js-tv3-init']/@value)/(decimal(TV3.duration) div TV3.CLIP_SEC_MAX3)" 20.337 xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" ^ --xquery ^"^ let $json:=parse-json(//input[@class='js-tv3-init']/@value),^ $amnt:=$json/(decimal(TV3.duration) div TV3.CLIP_SEC_MAX3)^ for $x in 0 to $amnt^ return^ join(^ (^ $x * 180,^ if ($x eq integer($amnt)) then^ ceiling($x * 180 + ($amnt mod integer($amnt) * 180))^ else^ ($x + 1) * 180 - 1^ )^ )^ " 0 179 180 359 360 539 [...] 3420 3599 3600 3661 xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" ^ --xquery ^"^ let $json:=parse-json(//input[@class='js-tv3-init']/@value),^ $amnt:=$json/(decimal(TV3.duration) div TV3.CLIP_SEC_MAX3)^ for $x in 0 to $amnt^ return^ concat(^ substring-before($json/(TV3.clipstream_clips)(1),'?'),^ '?t=',$x * 180,'/',^ if ($x eq integer($amnt)) then^ ceiling($x * 180 + ($amnt mod integer($amnt) * 180))^ else^ ($x + 1) * 180 - 1,^ '^&ignore=x.mp4'^ )^ " http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=0/179&ignore=x.mp4 http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=180/359&ignore=x.mp4 [...] http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=3420/3599&ignore=x.mp4 http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=3600/3661&ignore=x.mp4 xidel.exe "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" ^ --follow-kind=xquery3 -f ^"^ let $json:=parse-json(//input[@class='js-tv3-init']/@value),^ $amnt:=$json/(decimal(TV3.duration) div TV3.CLIP_SEC_MAX3)^ for $x in 0 to $amnt^ return^ concat(^ substring-before($json/(TV3.clipstream_clips)(1),'?'),^ '?t=',$x * 180,'/',^ if ($x eq integer($amnt)) then^ ceiling($x * 180 + ($amnt mod integer($amnt) * 180))^ else^ ($x + 1) * 180 - 1,^ '^&ignore=x.mp4'^ )^ " ^ --download ^"{^ request-decode($url)/concat(^ extract(path,'.+/(.+)\.',1),^ '_',^ ceiling(params/end div 180),^ '.mp4'^ )^ }"
__________________
My hobby website |
2nd May 2021, 03:33 | #4 | Link | |
Registered User
Join Date: Sep 2003
Location: Berlin, Germany
Posts: 3,079
|
Which "terms of service" are you referring to? AFAIK the only thing whch matters here is a violation of the forum rules.
I believe that for archived TV previews the same guidelines apply as for YouTube downloads. If you can watch it legally (which I assume is true for archived TV previews) then you can also record it legally to your VHS recorder, or todays equivalent which is download it to your HDD. Of course not for commercial purposes. You may want to reread these old posts: https://forum.doom9.org/showthread.p...38#post1658338 And another interesting post: https://forum.doom9.org/showthread.p...68#post1432768 Quote:
FWIW I had no problems downloading the clips from both linked URLs in the first post with IDM (Internet Download Manager). They have a 30 day trial period. Last edited by manolito; 2nd May 2021 at 03:37. |
|
2nd May 2021, 10:14 | #5 | Link |
Registered User
Join Date: Nov 2005
Posts: 693
|
Interesting. Did this IDM show you the url (or urls?) it was using?
__________________
My hobby website |
2nd May 2021, 12:21 | #6 | Link |
Registered User
Join Date: Sep 2003
Location: Berlin, Germany
Posts: 3,079
|
Yes it does.
Clicking "Properties" for any downloaded clip IDM reveals the download URL. For one of the clips from the second link in the first post the URL which IDM reported was https://ia800901.us.archive.org/19/i...0&ignore=x.mp4 |
2nd May 2021, 13:03 | #7 | Link |
Registered User
Join Date: Jun 2002
Location: On thin ice
Posts: 6,837
|
The 'Joe Arpaio' of the Doom9 forum doing his thing.
__________________
https://github.com/stax76/software-list https://www.youtube.com/@stax76/playlists |
2nd May 2021, 13:16 | #8 | Link |
Registered User
Join Date: Nov 2005
Posts: 693
|
So it's using the same urls* as I extracted from the "TV3.clipstream_clips"-array.
I don't know if you've downloaded some or all of them, but if you did, then you should notice the hiccup after every minute. *xidel automatically follows a redirected url, so there's no need to do that beforehand: Code:
xidel.exe -s --method=HEAD "http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=300/360&ignore=x.mp4" -e "$url" http://ia600901.us.archive.org/19/items/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?start=300&end=360&ignore=x.mp4 curl.exe -Isw "%{redirect_url}\n" -o NUL "http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=300/360&ignore=x.mp4" http://ia600901.us.archive.org/19/items/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?start=300&end=360&ignore=x.mp4
__________________
My hobby website |
3rd May 2021, 17:43 | #10 | Link |
Registered User
Join Date: Nov 2005
Posts: 693
|
I'm not familiar with AviDemux. Do you know if the same can be done with ffmpeg by any chance?
__________________
My hobby website |
4th May 2021, 01:50 | #11 | Link |
Registered User
Join Date: Sep 2003
Location: Berlin, Germany
Posts: 3,079
|
I believe ffmpeg uses a different approach than other tools when it comes to making cuts and edits. It uses timestamps to define the edit points instead of frame numbers. The ffmpeg GUIs I use (DMMediaConverter and WinFF) require to enter time points either in seconds or in the hh:mm:ss format, and there is no visual control for the user. The only ffmpeg based GUI I am aware of which lets you do this visually is DVDStyler. But there is no way to make multiple edits, only one start and end point can be defined.
AviDemux is quite similar to VirtualDub when it comes to editing. Multiple edit points are supported, full visual control, unlimited redo option. It is not frame accurate, though. For frame accurate editing without reencoding I prefer SmartCutter. If you need to use AviDemux under WinXP, avidemux_2.6.8_win32_v2.exe works fine. Get it at VideoHelp. A slightly newer no-install version linked to by Mr. mean himself is here: http://fixounet.free.fr/avidemux/avi...win32_winxp.7z This is version 2.6.10. It does not come with AVSProxy.exe, if you need the proxy then you can extract the archive over an installed version 2.6.8. Last edited by manolito; 4th May 2021 at 05:59. |
11th August 2021, 19:47 | #12 | Link |
Registered User
Join Date: Aug 2019
Posts: 14
|
Hey thanks Reino for looking at this.
I forgot about making this post, but today I tried to download the stream only predator videos from archive.org using tartube + yt-dlp because I read it is better than normal youtube-dl to download videos from the internet. But it still fails. I don't understand why the youtube-dl people won't fix this problem. If the video is on archive.org with 1 minute mp4 files, I don't understand how it violates terms of service like videoh says to download the mp4 files and stich them together like youtube-dl does for other archive.org stream only videos that I showed in my original post. Isn't that the whole point of youtube-dl - to download anything off the internet? So why is it so wrong to download a stream only video from archive.org where you can view the 1 minute videos anyway? |
|
|