I am trying to create a application that multi threaded downloads images from a website, as a introduction into threading. (never used threading properly before)

我正在尝试创建一个多线程从网站下载图像的应用程序,作为线程的介绍。 (之前从未正确使用过线程)

But currently it seems to create 1000+ threads and I am not sure where they are coming from.


I first queue a thread into a thread pool, for starters i only have 1 job in the jobs array


foreach (Job j in Jobs)
    ThreadPool.QueueUserWorkItem(Download, j);

Which starts the void Download(object obj) on a new thread where it loops through a certain amount of pages (images needed / 42 images per page)

在一个新线程上启动void Download(object obj),它在一个新的线程中循环一定数量的页面(需要的图像/每​​页42个图像)

for (var i = 0; i 如果我错了,请纠正我,在同一个线程上调用下一个void

void ProcessPage(string response, bool secondPass, Job j)
    var wc = new WebClient();
    LinkItem[] linkRespOnse= LinkFinder.Find(response).ToArray();

    foreach (LinkItem i in linkResponse)
        if (secondPass)
            if (string.IsNullOrEmpty(i.Href))
            else if (i.Href.Contains("http://loreipsum."))
                if (DownloadImage(i.Href, ID(i.Href)))
            if (i.Href.Contains(";id="))
                var alterRespOnse= wc.DownloadString("http://www." + j.Provider.ToString() + "/index.php?page=post&s=view&id=" + ID(i.Href));
                ProcessPage(alterResponse, true, j);

And finally passes on to the last function and downloads the actual image


bool DownloadImage(string target, int id)
    var url = new System.Uri(target);
    var fi = new System.IO.FileInfo(url.AbsolutePath);
    var ext = fi.Extension;

    if (!string.IsNullOrEmpty(ext))
        using (var wc = new WebClient())
                wc.DownloadFileAsync(url, id + ext);
                return true;
            catch(System.Exception e)
                if (DEBUG) Debug.Log(e);
        Debug.Log("Returned Without a extension: " + url + " || " + fi.FullName);
        return false;
    return true;

I am not sure how I am starting this many threads, but would love to know.



The goal of this program is to download the different job in jobs at the same time (max of 5) each downloading a maximum of 42 images at the time.


so a maximum of 210 images can/should be downloaded maximum at all times.


2 个解决方案


First of all, how did you measure the thread count? Why do you think that you have thousand of them in your application? You are using the ThreadPool, so you don't create them by yourself, and the ThreadPool wouldn't create such great amount of them for it's needs.


Second, you are mixing synchronious and asynchronious operations in your code. As you can't use TPL and async/await, let's go through you code and count the unit-of-works you are creating, so you can minimize them. After you do this, the number of queued items in ThreadPool will decrease and your application will gain performance you need.

其次,您在代码中混合了同步和异步操作。由于您无法使用TPL和async / await,让我们通过您的代码并计算您正在创建的工作单元,以便最小化它们。执行此操作后,ThreadPool中的排队项目数将减少,您的应用程序将获得所需的性能。

  1. You don't set the SetMaxThreads method in your application, so, according the MSDN:


    Maximum Number of Thread Pool Threads
    The number of operations that can be queued to the thread pool is limited only by available memory; however, the thread pool limits the number of threads that can be active in the process simultaneously. By default, the limit is 25 worker threads per CPU and 1,000 I/O completion threads.

    最大线程池线程数可以排队到线程池的操作数仅受可用内存的限制;但是,线程池会限制同时在进程中处于活动状态的线程数。默认情况下,限制为每个CPU 25个工作线程和1,000个I / O完成线程。

    So you must set the maximum to the 5.


  2. I can't find a place in your code where you check the 42 images per Job, you are only incrementing the value in ProcessPage method.


  3. Check the ManagedThreadId for the handle of WebClient.DownloadStringCompleted - does it execute in different thread or not.
  4. 检查ManagedThreadId以获取WebClient.DownloadStringCompleted的句柄 - 它是否在不同的线程中执行。

  5. You are adding the new item in ThreadPool queue, why are you using the asynchronious operation for Downloading? Use a synchronious overload, like this:


    ProcessPage(wc.DownloadString(downloadLink), false, j);

    This will not create another one item in ThreadPool queue, and you wouldn't have a sinchronisation context switch here.


  6. In ProcessPage your wc variable doesn't being garbage collected, so you aren't freeing all your resourses here. Add using statement here:


    void ProcessPage(string response, bool secondPass, Job j)
        using (var wc = new WebClient())
            LinkItem[] linkRespOnse= LinkFinder.Find(response).ToArray();
            foreach (LinkItem i in linkResponse)
                if (secondPass)
                    if (string.IsNullOrEmpty(i.Href))
                    else if (i.Href.Contains("http://loreipsum."))
                        if (DownloadImage(i.Href, ID(i.Href)))
                    if (i.Href.Contains(";id="))
                        var alterRespOnse= wc.DownloadString("http://www." + j.Provider.ToString() + "/index.php?page=post&s=view&id=" + ID(i.Href));
                        ProcessPage(alterResponse, true, j);
  7. In DownloadImage method you also use the asynchronious load. This also adds item in ThreadPoll queue, and I think that you can avoid this, and use synchronious overload too:


    wc.DownloadFile(url, id + ext);
    return true; 

So, in general, avoid the context-switching operations and dispose your resources properly.



Your wc WebClinet will go out of scope and be randomly garbage collected before the async callback. Also on all async calls you have to allow for immediate return and the actual delegated function return. So processPage will have to be in two places. Also the j in the original loop may be going out of scope depending on where Download in the original loop is declared.

您的wc WebClinet将超出范围并在异步回调之前随机进行垃圾回收。此外,对于所有异步调用,您必须允许立即返回并返回实际的委托函数。所以processPage必须在两个地方。此外,原始循环中的j可能超出范围,具体取决于声明原始循环中的下载位置。

