深度学习之Caffe完全掌握：配置GPU驱动安装cuda

作者：帝薩克斯_271 | 来源：互联网 | 2023-10-12 14:46

深度学习之Caffe完全掌握：配置GPU驱动安装cuda安装Nvidia驱动其他博客一般是要求安装你的电脑对应的版本，我直接装了nvidia-367&

深度学习之Caffe完全掌握&＃xff1a;配置GPU驱动安装cuda

安装Nvidia驱动

其他博客一般是要求安装你的电脑对应的版本&＃xff0c;我直接装了nvidia-367&＃xff08;为了配合cuda8.0&＃xff09;,也可以用。
你也可以参照&＃xff1a;http://blog.csdn.net/xuzhongxiong/article/details/52717285

root&＃64;master# sudo add-apt-repository ppa:xorg-edgers/ppa root&＃64;master# sudo apt-get update root&＃64;master# sudo apt-get install nvidia-367 root&＃64;master# sudo apt-get install mesa-common-dev root&＃64;master# sudo apt-get install freeglut3-dev root&＃64;master# nvidia-smi Sun Feb 11 11:18:43 2018 &＃43;-----------------------------------------------------------------------------&＃43; | NVIDIA-SMI 384.111 Driver Version: 384.111 | |-------------------------------&＃43;----------------------&＃43;----------------------&＃43; | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃43;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃43;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;| | 0 GeForce 920M Off | 00000000:01:00.0 N/A | N/A | | N/A 41C P5 N/A / N/A | 129MiB / 2004MiB | N/A Default | &＃43;-------------------------------&＃43;----------------------&＃43;----------------------&＃43;&＃43;-----------------------------------------------------------------------------&＃43; | Processes: GPU Memory | | GPU PID Type Process name Usage | |&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;| | 0 Not Supported | &＃43;-----------------------------------------------------------------------------&＃43;

安装就成功了

安装cuda8.0

https://developer.nvidia.com/cuda-toolkit
一定要选择8.0&＃xff0c;一共1.4G左右。
会有非常长的接受许可信息&＃xff0c;一直ENTER直到输入accept。驱动之前已经安装&＃xff0c;这里就不要选择安装驱动。其余的都直接默认或者选择是即可。
使用&＃xff1a;

root&＃64;master# sudo sh cuda_8.0.27_linux.run root&＃64;master# vim /etc/profile export PATH&＃61;/usr/local/cuda-8.0/bin:$PATH export LD_LIBRARY_PATH&＃61;/usr/local/cuda-8.0/lib64 root&＃64;master# source /etc/profile root&＃64;master# root&＃64;master#

测试cuda的Samples

root&＃64;master# cd /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery root&＃64;master# sudo make root&＃64;master# ./deviceQueryCUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: "GeForce 920M"CUDA Driver Version / Runtime Version 9.0 / 8.0CUDA Capability Major/Minor version number: 3.5Total amount of global memory: 2004 MBytes (2101542912 bytes)( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA CoresGPU Max Clock rate: 954 MHz (0.95 GHz)Memory Clock rate: 900 MhzMemory Bus Width: 64-bitL2 Cache Size: 524288 bytesMaximum Texture Dimension Size (x,y,z) 1D&＃61;(65536), 2D&＃61;(65536, 65536), 3D&＃61;(4096, 4096, 4096)Maximum Layered 1D Texture Size, (num) layers 1D&＃61;(16384), 2048 layersMaximum Layered 2D Texture Size, (num) layers 2D&＃61;(16384, 16384), 2048 layersTotal amount of constant memory: 65536 bytesTotal amount of shared memory per block: 49152 bytesTotal number of registers available per block: 65536Warp size: 32Maximum number of threads per multiprocessor: 2048Maximum number of threads per block: 1024Max dimension size of a thread block (x,y,z): (1024, 1024, 64)Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)Maximum memory pitch: 2147483647 bytesTexture alignment: 512 bytesConcurrent copy and kernel execution: Yes with 1 copy engine(s)Run time limit on kernels: YesIntegrated GPU sharing Host Memory: NoSupport host page-locked memory mapping: YesAlignment requirement for Surfaces: YesDevice has ECC support: DisabledDevice supports Unified Addressing (UVA): YesDevice PCI Domain ID / Bus ID / location ID: 0 / 1 / 0Compute Mode:with device simultaneously) >deviceQuery, CUDA Driver &＃61; CUDART, CUDA Driver Version &＃61; 9.0, CUDA Runtime Version &＃61; 8.0, NumDevs &＃61; 1, Device0 &＃61; GeForce 920M Result &＃61; PASS

sample下的蒙特卡罗模拟程序

cuda的编译语句

root&＃64;master# "/usr/local/cuda-8.0"/bin/nvcc -ccbin g&＃43;&＃43; -I../../common/inc -m64 -gencode arch&＃61;compute_20,code&＃61;sm_20 -gencode arch&＃61;compute_30,code&＃61;sm_30 -gencode arch&＃61;compute_35,code&＃61;sm_35 -gencode arch&＃61;compute_37,code&＃61;sm_37 -gencode arch&＃61;compute_50,code&＃61;sm_50 -gencode arch&＃61;compute_52,code&＃61;sm_52 -gencode arch&＃61;compute_60,code&＃61;sm_60 -gencode arch&＃61;compute_60,code&＃61;compute_60 -o MonteCarloMultiGPU.o -c MonteCarloMultiGPU.cpp

程序

#include #include #include #include #include // includes, project #include // Helper functions (utilities, parsing, timing) #include // helper functions (cuda error checking and initialization) #include #include "MonteCarlo_common.h"int *pArgc &＃61; NULL; char **pArgv &＃61; NULL;#ifdef WIN32 #define strcasecmp _strcmpi #endif//////////////////////////////////////////////////////////////////////////////// // Common functions //////////////////////////////////////////////////////////////////////////////// float randFloat(float low, float high) {float t &＃61; (float)rand() / (float)RAND_MAX;return (1.0f - t) * low &＃43; t * high; }/// Utility function to tweak problem size for small GPUs int adjustProblemSize(int GPU_N, int default_nOptions) {int nOptions &＃61; default_nOptions;// select problem sizefor (int i&＃61;0; iint cudaCores &＃61; _ConvertSMVer2Cores(deviceProp.major, deviceProp.minor)* deviceProp.multiProcessorCount;if (cudaCores <&＃61; 32){nOptions &＃61; (nOptions 2 ? nOptions : cudaCores/2);}}return nOptions; }int adjustGridSize(int GPUIndex, int defaultGridSize) {cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, GPUIndex));int maxGridSize &＃61; deviceProp.multiProcessorCount * 40;return ((defaultGridSize > maxGridSize) ? maxGridSize : defaultGridSize); }/////////////////////////////////////////////////////////////////////////////// // CPU reference functions /////////////////////////////////////////////////////////////////////////////// extern "C" void MonteCarloCPU(TOptionValue &callValue,TOptionData optionData,float *h_Random,int pathN );//Black-Scholes formula for call options extern "C" void BlackScholesCall(float &CallResult,TOptionData optionData );//////////////////////////////////////////////////////////////////////////////// // GPU-driving host thread //////////////////////////////////////////////////////////////////////////////// //Timer StopWatchInterface **hTimer &＃61; NULL;static CUT_THREADPROC solverThread(TOptionPlan *plan) {//Init GPUcheckCudaErrors(cudaSetDevice(plan->device));cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, plan->device));//Start the timersdkStartTimer(&hTimer[plan->device]);// Allocate intermediate memory for MC integrator and initialize// RNG statesinitMonteCarloGPU(plan);// Main computationMonteCarloGPU(plan);checkCudaErrors(cudaDeviceSynchronize());//Stop the timersdkStopTimer(&hTimer[plan->device]);//Shut down this GPUcloseMonteCarloGPU(plan);cudaStreamSynchronize(0);printf("solverThread() finished - GPU Device %d: %s\n", plan->device, deviceProp.name);CUT_THREADEND; }static void multiSolver(TOptionPlan *plan, int nPlans) {// allocate and initialize an array of stream handlescudaStream_t *streams &＃61; (cudaStream_t *) malloc(nPlans * sizeof(cudaStream_t));cudaEvent_t *events &＃61; (cudaEvent_t *)malloc(nPlans * sizeof(cudaEvent_t));for (int i &＃61; 0; i //Init Each GPU// In CUDA 4.0 we can call cudaSetDevice multiple times to target each device// Set the device desired, then perform initializations on that devicefor (int i&＃61;0 ; i// set the target device to perform initialization oncheckCudaErrors(cudaSetDevice(plan[i].device));cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, plan[i].device));// Allocate intermediate memory for MC integrator// and initialize RNG stateinitMonteCarloGPU(&plan[i]);}for (int i&＃61;0 ; i//Start the timersdkResetTimer(&hTimer[0]);sdkStartTimer(&hTimer[0]);for (int i&＃61;0; i//Main computationsMonteCarloGPU(&plan[i], streams[i]);checkCudaErrors(cudaEventRecord(events[i], streams[i]));}for (int i&＃61;0; i//Stop the timersdkStopTimer(&hTimer[0]);for (int i&＃61;0 ; i}/////////////////////////////////////////////////////////////////////////////// // Main program /////////////////////////////////////////////////////////////////////////////// #define DO_CPU #undef DO_CPU#define PRINT_RESULTS #undef PRINT_RESULTSvoid usage() {printf("--method&＃61;[threaded,streamed] --scaling&＃61;[strong,weak] [--help]\n");printf("Method&＃61;threaded: 1 CPU thread for each GPU [default]\n");printf(" streamed: 1 CPU thread handles all GPUs (requires CUDA 4.0 or newer)\n");printf("Scaling&＃61;strong : constant problem size\n");printf(" weak : problem size scales with number of available GPUs [default]\n"); }int main(int argc, char **argv) {char *multiMethodChoice &＃61; NULL;char *scalingChoice &＃61; NULL;bool use_threads &＃61; true;bool bqatest &＃61; false;bool strongScaling &＃61; false;pArgc &＃61; &argc;pArgv &＃61; argv;printf("%s Starting...\n\n", argv[0]);if (checkCmdLineFlag(argc, (const char **)argv, "qatest")){bqatest &＃61; true;}getCmdLineArgumentString(argc, (const char **)argv, "method", &multiMethodChoice);getCmdLineArgumentString(argc, (const char **)argv, "scaling", &scalingChoice);if (checkCmdLineFlag(argc, (const char **)argv, "h") ||checkCmdLineFlag(argc, (const char **)argv, "help")){usage();exit(EXIT_SUCCESS);}if (multiMethodChoice &＃61;&＃61; NULL){use_threads &＃61; false;}else{if (!strcasecmp(multiMethodChoice, "threaded")){use_threads &＃61; true;}else{use_threads &＃61; false;}}if (use_threads &＃61;&＃61; false){printf("Using single CPU thread for multiple GPUs\n");}if (scalingChoice &＃61;&＃61; NULL){strongScaling &＃61; false;}else{if (!strcasecmp(scalingChoice, "strong")){strongScaling &＃61; true;}else{strongScaling &＃61; false;}}//GPU number present in the systemint GPU_N;checkCudaErrors(cudaGetDeviceCount(&GPU_N));int nOptions &＃61; 8 * 1024;nOptions &＃61; adjustProblemSize(GPU_N, nOptions);// select problem sizeint scale &＃61; (strongScaling) ? 1 : GPU_N;int OPT_N &＃61; nOptions * scale;int PATH_N &＃61; 262144;// initialize the timershTimer &＃61; new StopWatchInterface*[GPU_N];for (int i&＃61;0; i//Input data arrayTOptionData *optionData &＃61; new TOptionData[OPT_N];//Final GPU MC resultsTOptionValue *callValueGPU &＃61; new TOptionValue[OPT_N];//"Theoretical" call values by Black-Scholes formulafloat *callValueBS &＃61; new float[OPT_N];//Solver configTOptionPlan *optionSolver &＃61; new TOptionPlan[GPU_N];//OS thread IDCUTThread *threadID &＃61; new CUTThread[GPU_N];int gpuBase, gpuIndex;int i;float time;double delta, ref, sumDelta, sumRef, sumReserve;printf("MonteCarloMultiGPU\n");printf("&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;&＃61;\n");printf("Parallelization method &＃61; %s\n", use_threads ? "threaded" : "streamed");printf("Problem scaling &＃61; %s\n", strongScaling? "strong" : "weak");printf("Number of GPUs &＃61; %d\n", GPU_N);printf("Total number of options &＃61; %d\n", OPT_N);printf("Number of paths &＃61; %d\n", PATH_N);printf("main(): generating input data...\n");srand(123);for (i&＃61;0; i 5.0f, 50.0f);optionData[i].X &＃61; randFloat(10.0f, 25.0f);optionData[i].T &＃61; randFloat(1.0f, 5.0f);optionData[i].R &＃61; 0.06f;optionData[i].V &＃61; 0.10f;callValueGPU[i].Expected &＃61; -1.0f;callValueGPU[i].Confidence &＃61; -1.0f;}printf("main(): starting %i host threads...\n", GPU_N);//Get option count for each GPUfor (i &＃61; 0; i //Take into account cases with "odd" option countsfor (i &＃61; 0; i <(OPT_N % GPU_N); i&＃43;&＃43;){optionSolver[i].optionCount&＃43;&＃43;;}//Assign GPU option rangesgpuBase &＃61; 0;for (i &＃61; 0; i if (use_threads || bqatest){//Start CPU thread for each GPUfor (gpuIndex &＃61; 0; gpuIndex "main(): waiting for GPU results...\n");cutWaitForThreads(threadID, GPU_N);printf("main(): GPU statistics, threaded\n");for (i &＃61; 0; i "GPU Device #%i: %s\n", optionSolver[i].device, deviceProp.name);printf("Options : %i\n", optionSolver[i].optionCount);printf("Simulation paths: %i\n", optionSolver[i].pathN);time &＃61; sdkGetTimerValue(&hTimer[i]);printf("Total time (ms.): %f\n", time);printf("Options per sec.: %f\n", OPT_N / (time * 0.001));}printf("main(): comparing Monte Carlo and Black-Scholes results...\n");sumDelta &＃61; 0;sumRef &＃61; 0;sumReserve &＃61; 0;for (i &＃61; 0; i ref &＃61; callValueBS[i];sumDelta &＃43;&＃61; delta;sumRef &＃43;&＃61; fabs(ref);if (delta > 1e-6){sumReserve &＃43;&＃61; callValueGPU[i].Confidence / delta;}#ifdef PRINT_RESULTSprintf("BS: %f; delta: %E\n", callValueBS[i], delta); #endif}sumReserve /&＃61; OPT_N;}if (!use_threads || bqatest){multiSolver(optionSolver, GPU_N);printf("main(): GPU statistics, streamed\n");for (i &＃61; 0; i "GPU Device #%i: %s\n", optionSolver[i].device, deviceProp.name);printf("Options : %i\n", optionSolver[i].optionCount);printf("Simulation paths: %i\n", optionSolver[i].pathN);}time &＃61; sdkGetTimerValue(&hTimer[0]);printf("\nTotal time (ms.): %f\n", time);printf("\tNote: This is elapsed time for all to compute.\n");printf("Options per sec.: %f\n", OPT_N / (time * 0.001));printf("main(): comparing Monte Carlo and Black-Scholes results...\n");sumDelta &＃61; 0;sumRef &＃61; 0;sumReserve &＃61; 0;for (i &＃61; 0; i ref &＃61; callValueBS[i];sumDelta &＃43;&＃61; delta;sumRef &＃43;&＃61; fabs(ref);if (delta > 1e-6){sumReserve &＃43;&＃61; callValueGPU[i].Confidence / delta;}#ifdef PRINT_RESULTSprintf("BS: %f; delta: %E\n", callValueBS[i], delta); #endif}sumReserve /&＃61; OPT_N;}#ifdef DO_CPUprintf("main(): running CPU MonteCarlo...\n");TOptionValue callValueCPU;sumDelta &＃61; 0;sumRef &＃61; 0;for (i &＃61; 0; i ref &＃61; callValueCPU.Expected;sumDelta &＃43;&＃61; delta;sumRef &＃43;&＃61; fabs(ref);printf("Exp : %f | %f\t", callValueCPU.Expected, callValueGPU[i].Expected);printf("Conf: %f | %f\n", callValueCPU.Confidence, callValueGPU[i].Confidence);}printf("L1 norm: %E\n", sumDelta / sumRef); #endifprintf("Shutting down...\n");for (int i&＃61;0; i"Test Summary...\n");printf("L1 norm : %E\n", sumDelta / sumRef);printf("Average reserve: %f\n", sumReserve);printf("\nNOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.\n\n");printf(sumReserve > 1.0f ? "Test passed\n" : "Test failed!\n");exit(sumReserve > 1.0f ? EXIT_SUCCESS : EXIT_FAILURE); }

推荐阅读

bash
网络链路质量监控：Smokeping部署与配置

本文详细介绍了如何在Linux系统上安装和配置Smokeping，以实现对网络链路质量的实时监控。通过详细的步骤和必要的依赖包安装，确保用户能够顺利完成部署并优化其网络性能监控。 ... [详细]

蜡笔小新 2024-12-27 19:31:05
settings
PyCharm下载与安装指南

本文详细介绍如何从官方渠道下载并安装PyCharm集成开发环境（IDE），涵盖Windows、macOS和Linux系统，同时提供详细的安装步骤及配置建议。 ... [详细]

蜡笔小新 2024-12-28 09:42:41
main
Deepin系统下MySQL 5.7安装指南

本文详细记录了在基于Debian的Deepin 20操作系统上安装MySQL 5.7的具体步骤，包括软件包的选择、依赖项的处理及远程访问权限的配置。 ... [详细]

蜡笔小新 2024-12-28 10:48:41
command
使用arm-eabi-gdb调试Android C/C++应用程序的详细指南

本文详细介绍如何使用arm-eabi-gdb调试Android平台上的C/C++程序。通过具体步骤和实用技巧，帮助开发者更高效地进行调试工作。 ... [详细]

蜡笔小新 2024-12-28 10:25:18
default
Python配置文件读写指南

本文详细介绍如何使用Python进行配置文件的读写操作，涵盖常见的配置文件格式（如INI、JSON、TOML和YAML），并提供具体的代码示例。 ... [详细]

蜡笔小新 2024-12-28 08:39:55
default
Installing the MongoDB PHP Driver on XAMPP for macOS

This guide provides a comprehensive step-by-step approach to successfully installing the MongoDB PHP driver on XAMPP for macOS, ensuring a smooth and efficient setup process. ... [详细]

蜡笔小新 2024-12-27 19:58:25
bash
Linux 自动化安装脚本详解

本文介绍了一款用于自动化部署 Linux 服务的 Bash 脚本。该脚本不仅涵盖了基本的文件复制和目录创建，还处理了系统服务的配置和启动，确保在多种 Linux 发行版上都能顺利运行。 ... [详细]

蜡笔小新 2024-12-27 16:33:32
select
PHP 编程疑难解析与知识点汇总

本文详细解答了 PHP 编程中的常见问题，并提供了丰富的代码示例和解决方案，帮助开发者更好地理解和应用 PHP 知识。 ... [详细]

蜡笔小新 2024-12-28 12:22:34
default
编写有趣的VBScript恶作剧脚本

本文将介绍如何编写一些有趣的VBScript脚本，这些脚本可以在朋友之间进行无害的恶作剧。通过简单的代码示例，帮助您了解VBScript的基本语法和功能。 ... [详细]

蜡笔小新 2024-12-28 09:46:23
default
资源推荐 | TensorFlow官方中文教程助力英语非母语者学习

来源：机器之心。本文详细介绍了TensorFlow官方提供的中文版教程和指南，帮助开发者更好地理解和应用这一强大的开源机器学习平台。 ... [详细]

蜡笔小新 2024-12-28 09:00:51
default
Handling Null Object Encoding in OAuth 1.0a API Implementation

Explore a common issue encountered when implementing an OAuth 1.0a API, specifically the inability to encode null objects and how to resolve it. ... [详细]

蜡笔小新 2024-12-28 08:54:34
settings
解决Uploadify在IE浏览器中的兼容性问题

本文详细介绍了如何解决Uploadify插件在Internet Explorer（IE）9和10版本中遇到的点击失效及JQuery运行时错误问题。通过修改相关JavaScript代码，确保上传功能在不同浏览器环境中的一致性和稳定性。 ... [详细]

蜡笔小新 2024-12-27 22:07:40
settings
如何使用JavaScript或jQuery检测文本框焦点状态和鼠标悬停事件

本文介绍了如何利用JavaScript或jQuery来判断网页中的文本框是否处于焦点状态，以及如何检测鼠标是否悬停在指定的HTML元素上。 ... [详细]

蜡笔小新 2024-12-27 21:33:33
settings
导航栏样式练习：项目实例解析

本文详细介绍了如何创建一个具有动态效果的导航栏，包括HTML、CSS和JavaScript代码的实现，并附有详细的说明和效果图。 ... [详细]

蜡笔小新 2024-12-27 19:42:28
web
深入理解Cookie与Session会话管理

本文详细介绍了如何通过HTTP响应和请求处理浏览器的Cookie信息，以及如何创建、设置和管理Cookie。同时探讨了会话跟踪技术中的Session机制，解释其原理及应用场景。 ... [详细]

蜡笔小新 2024-12-27 18:20:43

帝薩克斯_271

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章