I am trying to create a application that multi threaded downloads images from a website, as a introduction into threading. (never used threading properly before)
我正在尝试创建一个多线程从网站下载图像的应用程序,作为线程的介绍。 (之前从未正确使用过线程)
But currently it seems to create 1000+ threads and I am not sure where they are coming from.
但目前它似乎创造了1000多个线程,我不确定它们来自哪里。
I first queue a thread into a thread pool, for starters i only have 1 job in the jobs array
我首先将一个线程排入一个线程池,对于初学者我在jobs数组中只有一个作业
foreach (Job j in Jobs)
{
ThreadPool.QueueUserWorkItem(Download, j);
}
Which starts the void Download(object obj)
on a new thread where it loops through a certain amount of pages (images needed / 42 images per page)
在一个新线程上启动void Download(object obj),它在一个新的线程中循环一定数量的页面(需要的图像/每页42个图像)
for (var i = 0; i
{
respOnse= e.Result;
ProcessPage(response, false, j);
};
}
catch (System.Exception e)
{
// Unity editor equivalent of console.writeline
Debug.Log(e);
}
}
}
correct me if I am wrong, the next void gets called on the same thread
如果我错了,请纠正我,在同一个线程上调用下一个void
void ProcessPage(string response, bool secondPass, Job j)
{
var wc = new WebClient();
LinkItem[] linkRespOnse= LinkFinder.Find(response).ToArray();
foreach (LinkItem i in linkResponse)
{
if (secondPass)
{
if (string.IsNullOrEmpty(i.Href))
continue;
else if (i.Href.Contains("http://loreipsum."))
{
if (DownloadImage(i.Href, ID(i.Href)))
j.Downloaded++;
}
}
else
{
if (i.Href.Contains(";id="))
{
var alterRespOnse= wc.DownloadString("http://www." + j.Provider.ToString() + "/index.php?page=post&s=view&id=" + ID(i.Href));
ProcessPage(alterResponse, true, j);
}
}
}
}
And finally passes on to the last function and downloads the actual image
最后传递给最后一个函数并下载实际图像
bool DownloadImage(string target, int id)
{
var url = new System.Uri(target);
var fi = new System.IO.FileInfo(url.AbsolutePath);
var ext = fi.Extension;
if (!string.IsNullOrEmpty(ext))
{
using (var wc = new WebClient())
{
try
{
wc.DownloadFileAsync(url, id + ext);
return true;
}
catch(System.Exception e)
{
if (DEBUG) Debug.Log(e);
}
}
}
else
{
Debug.Log("Returned Without a extension: " + url + " || " + fi.FullName);
return false;
}
return true;
}
I am not sure how I am starting this many threads, but would love to know.
我不知道我是如何开始这么多线程的,但我很想知道。
Edit
The goal of this program is to download the different job in jobs at the same time (max of 5) each downloading a maximum of 42 images at the time.
该程序的目标是同时下载作业中的不同作业(最多5个),每个下载最多42个图像。
so a maximum of 210 images can/should be downloaded maximum at all times.
所以最多可以/应该最多下载210张图像。
First of all, how did you measure the thread count? Why do you think that you have thousand of them in your application? You are using the ThreadPool
, so you don't create them by yourself, and the ThreadPool
wouldn't create such great amount of them for it's needs.
首先,你是如何衡量线程数的?为什么你认为你的应用程序中有数千个?您正在使用ThreadPool,因此您不需要自己创建它们,并且ThreadPool不会为它的需要创建如此大量的它们。
Second, you are mixing synchronious and asynchronious operations in your code. As you can't use TPL
and async/await
, let's go through you code and count the unit-of-works
you are creating, so you can minimize them. After you do this, the number of queued items in ThreadPool
will decrease and your application will gain performance you need.
其次,您在代码中混合了同步和异步操作。由于您无法使用TPL和async / await,让我们通过您的代码并计算您正在创建的工作单元,以便最小化它们。执行此操作后,ThreadPool中的排队项目数将减少,您的应用程序将获得所需的性能。
You don't set the SetMaxThreads
method in your application, so, according the MSDN:
您没有在应用程序中设置SetMaxThreads方法,因此,根据MSDN:
Maximum Number of Thread Pool Threads
The number of operations that can be queued to the thread pool is limited only by available memory; however, the thread pool limits the number of threads that can be active in the process simultaneously. By default, the limit is 25 worker threads per CPU and 1,000 I/O completion threads.最大线程池线程数可以排队到线程池的操作数仅受可用内存的限制;但是,线程池会限制同时在进程中处于活动状态的线程数。默认情况下,限制为每个CPU 25个工作线程和1,000个I / O完成线程。
So you must set the maximum to the 5
.
所以你必须将最大值设置为5。
I can't find a place in your code where you check the 42
images per Job, you are only incrementing the value in ProcessPage
method.
我在代码中找不到每个Job检查42个图像的位置,只是在ProcessPage方法中增加值。
ManagedThreadId
for the handle of WebClient.DownloadStringCompleted
- does it execute in different thread or not.检查ManagedThreadId以获取WebClient.DownloadStringCompleted的句柄 - 它是否在不同的线程中执行。
You are adding the new item in ThreadPool
queue, why are you using the asynchronious operation for Downloading? Use a synchronious overload, like this:
您是在ThreadPool队列中添加新项目,为什么使用异步操作进行下载?使用同步重载,如下所示:
ProcessPage(wc.DownloadString(downloadLink), false, j);
This will not create another one item in ThreadPool
queue, and you wouldn't have a sinchronisation context switch here.
这不会在ThreadPool队列中创建另一个项目,并且您不会在此处进行同步上下文切换。
In ProcessPage
your wc
variable doesn't being garbage collected, so you aren't freeing all your resourses here. Add using
statement here:
在ProcessPage中,您的wc变量不会被垃圾收集,因此您不会在此处释放所有资源。在此处添加using语句:
void ProcessPage(string response, bool secondPass, Job j)
{
using (var wc = new WebClient())
{
LinkItem[] linkRespOnse= LinkFinder.Find(response).ToArray();
foreach (LinkItem i in linkResponse)
{
if (secondPass)
{
if (string.IsNullOrEmpty(i.Href))
continue;
else if (i.Href.Contains("http://loreipsum."))
{
if (DownloadImage(i.Href, ID(i.Href)))
j.Downloaded++;
}
}
else
{
if (i.Href.Contains(";id="))
{
var alterRespOnse= wc.DownloadString("http://www." + j.Provider.ToString() + "/index.php?page=post&s=view&id=" + ID(i.Href));
ProcessPage(alterResponse, true, j);
}
}
}
}
}
In DownloadImage
method you also use the asynchronious load. This also adds item in ThreadPoll
queue, and I think that you can avoid this, and use synchronious overload too:
在DownloadImage方法中,您还使用异步加载。这也添加了ThreadPoll队列中的项目,我认为你可以避免这种情况,并使用同步重载:
wc.DownloadFile(url, id + ext);
return true;
So, in general, avoid the context-switching operations and dispose your resources properly.
因此,通常,避免上下文切换操作并正确处理您的资源。
Your wc WebClinet will go out of scope and be randomly garbage collected before the async callback. Also on all async calls you have to allow for immediate return and the actual delegated function return. So processPage will have to be in two places. Also the j in the original loop may be going out of scope depending on where Download in the original loop is declared.
您的wc WebClinet将超出范围并在异步回调之前随机进行垃圾回收。此外,对于所有异步调用,您必须允许立即返回并返回实际的委托函数。所以processPage必须在两个地方。此外,原始循环中的j可能超出范围,具体取决于声明原始循环中的下载位置。