Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Announcements and Chat > General Discussion
Register FAQ Calendar Today's Posts Search

Closed Thread
 
Thread Tools Search this Thread Display Modes
Old 28th October 2020, 19:53   #1  |  Link
Ilovetv9
Registered User
 
Join Date: Aug 2019
Posts: 14
Need help downloading stream only video from archive.org with youtube-dl

I am able to download some stream only videos from archive.org with youtube-dl with no problems. For example this downloads all the separate 1 minute .mp4 videos and combines them when finished - https://archive.org/details/WJLA_200..._Peek_Special/

But another stream only that does not work is this video - https://archive.org/details/MSNBCW_2...ch_a_Predator/

The log says when trying to download "ERROR: unable to download video data: HTTP Error 403: Forbidden". It does not say that when downloading the first video link. What am I doing wrong? Why does the first stream only video link I can download with no problems, but the second link says its "Forbidden" How can I download? Is there another program that can download all stream only archive.org videos? Please and thanks for any help.
Ilovetv9 is offline  
Old 2nd May 2021, 02:17   #2  |  Link
Reino
Registered User
 
Reino's Avatar
 
Join Date: Nov 2005
Posts: 693
In the HTML-source in a "meta"-node you can find...
Code:
<meta property="og:video" content="https://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4">
...but strangely enough you'll get a "HTTP error 403 Forbidden" trying to open the video-url.

What about youtube-dl?
Code:
youtube-dl.exe -F "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator"
[archive.org] MSNBCW_20131125_040000_To_Catch_a_Predator: Downloading webpage
[archive.org] MSNBCW_20131125_040000_To_Catch_a_Predator: Downloading JSON metadata
[info] Available formats for MSNBCW_20131125_040000_To_Catch_a_Predator:
format code  extension  resolution note
0            mp4        640x360

youtube-dl.exe -g "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator"
https://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?exact=1&start=0&end=120
The video on this website is only available as lots of artificially cut up segments.
youtube-dl scrapes the "embed"-variant of this website, which only has the first 2 minute segment of this video in the HTML-source.
The initial url you provided does have all the cut-up-segment urls in its HTML-source. These urls work, but you'll need a HTML/JSON parser to extract them.
Luckily xidel is a command-line tool that can do just that. I'm going to assume you're on Windows btw.

The value of the "value"-attribute in this "input"-node...
Code:
<input class="js-tv3-init" type="hidden" value='{...}'/>
...has all the information you want:
Code:
xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" -e "//input[@class='js-tv3-init']/@value"
{"TV3.identifier":"MSNBCW_20131125_040000_To_Catch_a_Predator",[...]"TV3.aspectratio":1.7777777777778}
To parse this information as what it really is... JSON:
Code:
xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" -e "parse-json(//input[@class='js-tv3-init']/@value)"
{
  "TV3.identifier": "MSNBCW_20131125_040000_To_Catch_a_Predator",
  "TV3.embedable": 0,
  "TV3.ccnums": [".cc5", ".align", ".cc1", ".cc5"],
  "TV3.ignore_me": 61,
  "TV3.CLIP_SEC_MAX2": 60,
  "TV3.CLIP_SEC_MAX3": 180,
  "TV3.TVNRT": 0,
  "TV3.thumbzillas": [
    "000001",
    [...]
    "003646"
  ],
  "TV3.clipstream_clips": [
    "http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=0/60&ignore=x.mp4",
    [...]
    "http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=3600/3660&ignore=x.mp4"
  ],
  "TV3.quotes": [],
  "TV3.duration": "3660.66",
  "TV3.aspectratio": 1.7777777777778
}
To extract all the urls from the "TV3.clipstream_clips"-array:
Code:
xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" ^
-e "parse-json(//input[@class='js-tv3-init']/@value)/(TV3.clipstream_clips)()"
http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=0/60&ignore=x.mp4
[...]
http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=3600/3660&ignore=x.mp4
Use -f instead of -e to open/"follow" all the urls:
Code:
xidel.exe "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" ^
-f "parse-json(//input[@class='js-tv3-init']/@value)/(TV3.clipstream_clips)()" ^
--download "{request-decode($url)/concat(extract(path,'.+/(.+)\.',1),'_',params/end div 60,'.mp4')}"
The --download expression translates to (which are downloaded to the current dir):
Code:
MSNBCW_20131125_040000_To_Catch_a_Predator_1.mp4
[...]
MSNBCW_20131125_040000_To_Catch_a_Predator_61.mp4
Alternatively you can do everything with a single extraction query:
Code:
xidel.exe -s --xquery ^"^
  for $x at $i in parse-json(^
    doc('https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator')//input[@class='js-tv3-init']/@value^
  )/(TV3.clipstream_clips)()^
  return^
  x:request({'url':$x})/file:write-binary(^
    concat(extract(url,'.+/(.+)\.',1),'_',$i,'.mp4'),^
    string-to-base64Binary(raw)^
  )^
"
xidel is primarily not a file download tool, so there's no progress-bar. Use curl if you're desperate:
Code:
FOR /F "delims=" %A IN ('
  xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator"
  -e "parse-json(//input[@class='js-tv3-init']/@value)/(TV3.clipstream_clips)()"
') DO @curl.exe [options] "%A"
Once downloaded you can glue them all together with ffmpeg and its concat-demuxer. First create the concat list:
Code:
(FOR %A IN ("MSNBCW_20131125_040000_To_Catch_a_Predator_*.mp4") DO @ECHO file '%A') > mylist.txt
Then process with ffmpeg:
Code:
ffmpeg.exe -f concat -safe 0 -i "mylist.txt" -c copy "MSNBCW_20131125_040000_To_Catch_a_Predator.mp4"
BUT, there's a problem. At every 1 minute intersection you'll notice there's a small video fragment that's repeated.
As far as I can tell, there's no way to completely solve it, but I did find a way to make it less intrusive.

Sadly appending "?t=0/3661&ignore=x.mp4" doesn't work, but the extracted JSON does give away the maximum allowed segment length; the key "TV3.CLIP_SEC_MAX3" its value 180.
Appending "?t=0/180&ignore=x.mp4" does appear to work. I suggest you decrease that value by 1 second however for a smoother transition.
So with a segment length of 180 second (instead of 60) there will be 3 times less hiccups and by creating the start- and end numbers yourself the remaining transitions will be smoother.

Code:
xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" ^
-e "parse-json(//input[@class='js-tv3-init']/@value)/(decimal(TV3.duration) div TV3.CLIP_SEC_MAX3)"
20.337

xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" ^
--xquery ^"^
  let $json:=parse-json(//input[@class='js-tv3-init']/@value),^
      $amnt:=$json/(decimal(TV3.duration) div TV3.CLIP_SEC_MAX3)^
  for $x in 0 to $amnt^
  return^
  join(^
    (^
      $x * 180,^
      if ($x eq integer($amnt)) then^
        ceiling($x * 180 + ($amnt mod integer($amnt) * 180))^
      else^
        ($x + 1) * 180 - 1^
    )^
  )^
"
0 179
180 359
360 539
[...]
3420 3599
3600 3661

xidel.exe -s "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" ^
--xquery ^"^
  let $json:=parse-json(//input[@class='js-tv3-init']/@value),^
      $amnt:=$json/(decimal(TV3.duration) div TV3.CLIP_SEC_MAX3)^
  for $x in 0 to $amnt^
  return^
  concat(^
    substring-before($json/(TV3.clipstream_clips)(1),'?'),^
    '?t=',$x * 180,'/',^
    if ($x eq integer($amnt)) then^
      ceiling($x * 180 + ($amnt mod integer($amnt) * 180))^
    else^
      ($x + 1) * 180 - 1,^
    '^&amp;ignore=x.mp4'^
  )^
"
http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=0/179&ignore=x.mp4
http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=180/359&ignore=x.mp4
[...]
http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=3420/3599&ignore=x.mp4
http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=3600/3661&ignore=x.mp4

xidel.exe "https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator" ^
--follow-kind=xquery3 -f ^"^
  let $json:=parse-json(//input[@class='js-tv3-init']/@value),^
      $amnt:=$json/(decimal(TV3.duration) div TV3.CLIP_SEC_MAX3)^
  for $x in 0 to $amnt^
  return^
  concat(^
    substring-before($json/(TV3.clipstream_clips)(1),'?'),^
    '?t=',$x * 180,'/',^
    if ($x eq integer($amnt)) then^
      ceiling($x * 180 + ($amnt mod integer($amnt) * 180))^
    else^
      ($x + 1) * 180 - 1,^
    '^&amp;ignore=x.mp4'^
  )^
" ^
--download ^"{^
  request-decode($url)/concat(^
    extract(path,'.+/(.+)\.',1),^
    '_',^
    ceiling(params/end div 180),^
    '.mp4'^
  )^
}"
__________________
My hobby website
Reino is offline  
Old 2nd May 2021, 02:21   #3  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
We can't help you to violate terms of service.
videoh is offline  
Old 2nd May 2021, 03:33   #4  |  Link
manolito
Registered User
 
manolito's Avatar
 
Join Date: Sep 2003
Location: Berlin, Germany
Posts: 3,079
Quote:
Originally Posted by videoh View Post
We can't help you to violate terms of service.
Which "terms of service" are you referring to? AFAIK the only thing whch matters here is a violation of the forum rules.

I believe that for archived TV previews the same guidelines apply as for YouTube downloads. If you can watch it legally (which I assume is true for archived TV previews) then you can also record it legally to your VHS recorder, or todays equivalent which is download it to your HDD. Of course not for commercial purposes.

You may want to reread these old posts:
https://forum.doom9.org/showthread.p...38#post1658338

And another interesting post:
https://forum.doom9.org/showthread.p...68#post1432768
Quote:
Normal users should just report posts or simply leave it to the mods
This does not include former mods...

FWIW I had no problems downloading the clips from both linked URLs in the first post with IDM (Internet Download Manager). They have a 30 day trial period.

Last edited by manolito; 2nd May 2021 at 03:37.
manolito is offline  
Old 2nd May 2021, 10:14   #5  |  Link
Reino
Registered User
 
Reino's Avatar
 
Join Date: Nov 2005
Posts: 693
Quote:
Originally Posted by manolito View Post
I had no problems downloading the clips from both linked URLs in the first post with IDM (Internet Download Manager).
Interesting. Did this IDM show you the url (or urls?) it was using?
__________________
My hobby website
Reino is offline  
Old 2nd May 2021, 12:21   #6  |  Link
manolito
Registered User
 
manolito's Avatar
 
Join Date: Sep 2003
Location: Berlin, Germany
Posts: 3,079
Yes it does.

Clicking "Properties" for any downloaded clip IDM reveals the download URL. For one of the clips from the second link in the first post the URL which IDM reported was
https://ia800901.us.archive.org/19/i...0&ignore=x.mp4
manolito is offline  
Old 2nd May 2021, 13:03   #7  |  Link
stax76
Registered User
 
stax76's Avatar
 
Join Date: Jun 2002
Location: On thin ice
Posts: 6,837
The 'Joe Arpaio' of the Doom9 forum doing his thing.
stax76 is offline  
Old 2nd May 2021, 13:16   #8  |  Link
Reino
Registered User
 
Reino's Avatar
 
Join Date: Nov 2005
Posts: 693
So it's using the same urls* as I extracted from the "TV3.clipstream_clips"-array.
I don't know if you've downloaded some or all of them, but if you did, then you should notice the hiccup after every minute.

*xidel automatically follows a redirected url, so there's no need to do that beforehand:
Code:
xidel.exe -s --method=HEAD "http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=300/360&ignore=x.mp4" -e "$url"
http://ia600901.us.archive.org/19/items/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?start=300&end=360&ignore=x.mp4

curl.exe -Isw "%{redirect_url}\n" -o NUL "http://archive.org/download/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?t=300/360&ignore=x.mp4"
http://ia600901.us.archive.org/19/items/MSNBCW_20131125_040000_To_Catch_a_Predator/MSNBCW_20131125_040000_To_Catch_a_Predator.mp4?start=300&end=360&ignore=x.mp4
__________________
My hobby website
Reino is offline  
Old 3rd May 2021, 05:55   #9  |  Link
manolito
Registered User
 
manolito's Avatar
 
Join Date: Sep 2003
Location: Berlin, Germany
Posts: 3,079
Quote:
Originally Posted by Reino View Post
I don't know if you've downloaded some or all of them, but if you did, then you should notice the hiccup after every minute.
Yes, I also got these hiccups. But it was pretty straightforward to get rid of the repeated frames with AviDemux. Cutting only at I-Frames was all I needed for these clips.
manolito is offline  
Old 3rd May 2021, 17:43   #10  |  Link
Reino
Registered User
 
Reino's Avatar
 
Join Date: Nov 2005
Posts: 693
I'm not familiar with AviDemux. Do you know if the same can be done with ffmpeg by any chance?
__________________
My hobby website
Reino is offline  
Old 4th May 2021, 01:50   #11  |  Link
manolito
Registered User
 
manolito's Avatar
 
Join Date: Sep 2003
Location: Berlin, Germany
Posts: 3,079
I believe ffmpeg uses a different approach than other tools when it comes to making cuts and edits. It uses timestamps to define the edit points instead of frame numbers. The ffmpeg GUIs I use (DMMediaConverter and WinFF) require to enter time points either in seconds or in the hh:mm:ss format, and there is no visual control for the user. The only ffmpeg based GUI I am aware of which lets you do this visually is DVDStyler. But there is no way to make multiple edits, only one start and end point can be defined.

AviDemux is quite similar to VirtualDub when it comes to editing. Multiple edit points are supported, full visual control, unlimited redo option. It is not frame accurate, though. For frame accurate editing without reencoding I prefer SmartCutter.

If you need to use AviDemux under WinXP, avidemux_2.6.8_win32_v2.exe works fine. Get it at VideoHelp. A slightly newer no-install version linked to by Mr. mean himself is here:
http://fixounet.free.fr/avidemux/avi...win32_winxp.7z
This is version 2.6.10. It does not come with AVSProxy.exe, if you need the proxy then you can extract the archive over an installed version 2.6.8.

Last edited by manolito; 4th May 2021 at 05:59.
manolito is offline  
Old 11th August 2021, 19:47   #12  |  Link
Ilovetv9
Registered User
 
Join Date: Aug 2019
Posts: 14
Hey thanks Reino for looking at this.

I forgot about making this post, but today I tried to download the stream only predator videos from archive.org using tartube + yt-dlp because I read it is better than normal youtube-dl to download videos from the internet. But it still fails.

I don't understand why the youtube-dl people won't fix this problem. If the video is on archive.org with 1 minute mp4 files, I don't understand how it violates terms of service like videoh says to download the mp4 files and stich them together like youtube-dl does for other archive.org stream only videos that I showed in my original post.

Isn't that the whole point of youtube-dl - to download anything off the internet? So why is it so wrong to download a stream only video from archive.org where you can view the 1 minute videos anyway?
Ilovetv9 is offline  
Old 11th August 2021, 20:47   #13  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
It's copyrighted material. You cannot discuss downloading that stuff per rule 6.
videoh is offline  
Closed Thread


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:05.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.