Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Announcements and Chat > General Discussion

Reply
 
Thread Tools Search this Thread Display Modes
Old 5th September 2011, 07:44   #1  |  Link
Rouhi
Registered User
 
Join Date: Apr 2011
Posts: 64
How to handle huge file processing in video retrieval applications?

I am researching on video similarity detection. I have written a program in c++ which works on TRECVID videos which contains 8300 video. The program using a huge file i/o transaction (about 26000 files with 100kb size average) and process the data inside the files. A PC can not handle it efficiently. It looks very very slow when i run the program.
I have noticed the file i/o is not the bottle neck because the HDD led does not blinking during process,. When I look at task manager in windows, I noticed that the program use only one CPU core only. It reaches to 100% of one core but the other remain unused. hats your suggestion? Using cluster programming or multi thread multi core programming or Hadoop …… do you have any idea? Did you have same problem?
Rouhi is offline   Reply With Quote
Old 5th September 2011, 13:22   #2  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
If your "similarity detection" is not I/O-bound but CPU-bound and you are not using multiple threads yet (CPU load only on one single core), then you should be able to speed up things via multi-threading. In your scenario there should be two ways, maybe a combination of both is possible: Either you can do "coarse grained" multi-threading by processing multiple videos in parallel (given that the videos can be processed independently) or you can do "fine grained" multi-threading by parallelizing the similarity detection algorithm itself. It's impossible to give any advice for the latter without knowing the algorithm in detail, but generally you can think about processing multiple frames in parallel and/or diving each frame into multiple partitions that can be processed in parallel...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 5th September 2011 at 13:25.
LoRd_MuldeR is offline   Reply With Quote
Old 6th September 2011, 04:18   #3  |  Link
Rouhi
Registered User
 
Join Date: Apr 2011
Posts: 64
I extract the indexes in offline son the time for extracting the indexes from each video is not critical for me (at the moment). The problem is in search and finding sequence matching. TRECVID has more than 8000 videos and more than 1600 queries. handeling the similarity detection of each query against all 8000 video is very time consuming. I am looking forward a solution for this problem.
Thatnks for mentioning two Coarse-Grained Vs. Fine-Grained Threading but my problem is that i am very new in multi treading programming. Can you introduce any shortcut to upgrade a sequential program to multitreading program? if it is not exist, what is your suggestion for starting multithreading in c++?
Rouhi is offline   Reply With Quote
Old 6th September 2011, 10:23   #4  |  Link
7ekno
Guest
 
Posts: n/a
As above, only you know your algorithm for the search, nobody can help you multi-thread that without knowing it in detail themselves ...

The first step to multi-threading is identifying parts of your algorithm that are truely independent of any other operations, parts that are partially independent (that might need to be stored in a buffer/variable) and parts that are completely reliant on other parts ...

7ek
  Reply With Quote
Old 7th September 2011, 01:27   #5  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
So if I understand correctly, you first extract some kind of "features" from the videos in your database. As this is done beforehand, it is not time-critical.

Then you have a bunch of "queries" and for each query you need to find the videos that match the query - which is done by comparing the query to the feature vectors extracted before.

Are your feature vectors stored in a "flat" structure and you simply compare each query to every feature vector?

If so, you might be able to speed this up easily by processing several queries in parallel, because they can be processed independently. Each thread would simply process one query.

Another option would be parallelizing the compare step itself: Divide the list of feature vectors into n sub-lists. Then, for each query, start n threads. Each thread will handle one sub-list.


But, regardless of multi-threading, you should think about organizing your feature vectors in a more "optimized" structure, so you don't have to compare the query against all of them!

I read a paper that suggest mapping the features into a "metric space" and then aggregating the individual feature vectors (indices) into a number of so-called "clusters".

For each "cluster" exactly one feature vector (index) is chosen that represents the cluster best. That one is called a "buoy".

Once you have this structure, you can first compare the query to the buoys to find cluster that contain "suitable" feature vectors. Then do the "in-depth" search in these clusters only...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 7th September 2011 at 01:36.
LoRd_MuldeR is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 03:50.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.