热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

深度学习之Caffe完全掌握:配置GPU驱动安装cuda

深度学习之Caffe完全掌握:配置GPU驱动安装cuda安装Nvidia驱动其他博客一般是要求安装你的电脑对应的版本,我直接装了nvidia-367&

深度学习之Caffe完全掌握:配置GPU驱动安装cuda

这里写图片描述



安装Nvidia驱动

其他博客一般是要求安装你的电脑对应的版本,我直接装了nvidia-367(为了配合cuda8.0),也可以用。
你也可以参照:http://blog.csdn.net/xuzhongxiong/article/details/52717285

root@master# sudo add-apt-repository ppa:xorg-edgers/ppa
root@master# sudo apt-get update
root@master# sudo apt-get install nvidia-367
root@master# sudo apt-get install mesa-common-dev
root@master# sudo apt-get install freeglut3-dev
root@master# nvidia-smi
Sun Feb 11 11:18:43 2018
+-----------------------------------------------------------------------------+

| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 920M Off | 00000000:01:00.0 N/A | N/A |
| N/A 41C P5 N/A / N/A | 129MiB / 2004MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+

安装就成功了



安装cuda8.0

https://developer.nvidia.com/cuda-toolkit
一定要选择8.0,一共1.4G左右。
会有非常长的接受许可信息,一直ENTER直到输入accept。驱动之前已经安装,这里就不要选择安装驱动。其余的都直接默认或者选择是即可。
使用:

root@master# sudo sh cuda_8.0.27_linux.run
root@master# vim /etc/profile
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64
root@master# source /etc/profile
root@master#
root@master#



测试cuda的Samples

root@master# cd /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery
root@master# sudo make
root@master# ./deviceQueryCUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: "GeForce 920M"CUDA Driver Version / Runtime Version 9.0 / 8.0CUDA Capability Major/Minor version number: 3.5Total amount of global memory: 2004 MBytes (2101542912 bytes)( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA CoresGPU Max Clock rate: 954 MHz (0.95 GHz)Memory Clock rate: 900 MhzMemory Bus Width: 64-bitL2 Cache Size: 524288 bytesMaximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layersMaximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layersTotal amount of constant memory: 65536 bytesTotal amount of shared memory per block: 49152 bytesTotal number of registers available per block: 65536Warp size: 32Maximum number of threads per multiprocessor: 2048Maximum number of threads per block: 1024Max dimension size of a thread block (x,y,z): (1024, 1024, 64)Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)Maximum memory pitch: 2147483647 bytesTexture alignment: 512 bytesConcurrent copy and kernel execution: Yes with 1 copy engine(s)Run time limit on kernels: YesIntegrated GPU sharing Host Memory: NoSupport host page-locked memory mapping: YesAlignment requirement for Surfaces: YesDevice has ECC support: DisabledDevice supports Unified Addressing (UVA): YesDevice PCI Domain ID / Bus ID / location ID: 0 / 1 / 0Compute Mode:with device simultaneously) >deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce 920M
Result = PASS



sample下的蒙特卡罗模拟程序

cuda的编译语句

root@master# "/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64
-gencode arch=compute_20,code=sm_20
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_37,code=sm_37
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_52,code=sm_52
-gencode arch=compute_60,code=sm_60
-gencode arch=compute_60,code=compute_60
-o MonteCarloMultiGPU.o -c MonteCarloMultiGPU.cpp

程序

#include
#include
#include
#include
#include // includes, project
#include // Helper functions (utilities, parsing, timing)
#include // helper functions (cuda error checking and initialization)
#include #include "MonteCarlo_common.h"int *pArgc = NULL;
char **pArgv = NULL;#ifdef WIN32
#define strcasecmp _strcmpi
#endif////////////////////////////////////////////////////////////////////////////////
// Common functions
////////////////////////////////////////////////////////////////////////////////
float randFloat(float low, float high)
{float t = (float)rand() / (float)RAND_MAX;return (1.0f - t) * low + t * high;
}/// Utility function to tweak problem size for small GPUs
int adjustProblemSize(int GPU_N, int default_nOptions)
{int nOptions &#61; default_nOptions;// select problem sizefor (int i&#61;0; iint cudaCores &#61; _ConvertSMVer2Cores(deviceProp.major, deviceProp.minor)* deviceProp.multiProcessorCount;if (cudaCores <&#61; 32){nOptions &#61; (nOptions 2 ? nOptions : cudaCores/2);}}return nOptions;
}int adjustGridSize(int GPUIndex, int defaultGridSize)
{cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, GPUIndex));int maxGridSize &#61; deviceProp.multiProcessorCount * 40;return ((defaultGridSize > maxGridSize) ? maxGridSize : defaultGridSize);
}///////////////////////////////////////////////////////////////////////////////
// CPU reference functions
///////////////////////////////////////////////////////////////////////////////
extern "C" void MonteCarloCPU(TOptionValue &callValue,TOptionData optionData,float *h_Random,int pathN
);//Black-Scholes formula for call options
extern "C" void BlackScholesCall(float &CallResult,TOptionData optionData
);////////////////////////////////////////////////////////////////////////////////
// GPU-driving host thread
////////////////////////////////////////////////////////////////////////////////
//Timer
StopWatchInterface **hTimer &#61; NULL;static CUT_THREADPROC solverThread(TOptionPlan *plan)
{//Init GPUcheckCudaErrors(cudaSetDevice(plan->device));cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, plan->device));//Start the timersdkStartTimer(&hTimer[plan->device]);// Allocate intermediate memory for MC integrator and initialize// RNG statesinitMonteCarloGPU(plan);// Main computationMonteCarloGPU(plan);checkCudaErrors(cudaDeviceSynchronize());//Stop the timersdkStopTimer(&hTimer[plan->device]);//Shut down this GPUcloseMonteCarloGPU(plan);cudaStreamSynchronize(0);printf("solverThread() finished - GPU Device %d: %s\n", plan->device, deviceProp.name);CUT_THREADEND;
}static void multiSolver(TOptionPlan *plan, int nPlans)
{// allocate and initialize an array of stream handlescudaStream_t *streams &#61; (cudaStream_t *) malloc(nPlans * sizeof(cudaStream_t));cudaEvent_t *events &#61; (cudaEvent_t *)malloc(nPlans * sizeof(cudaEvent_t));for (int i &#61; 0; i //Init Each GPU// In CUDA 4.0 we can call cudaSetDevice multiple times to target each device// Set the device desired, then perform initializations on that devicefor (int i&#61;0 ; i// set the target device to perform initialization oncheckCudaErrors(cudaSetDevice(plan[i].device));cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, plan[i].device));// Allocate intermediate memory for MC integrator// and initialize RNG stateinitMonteCarloGPU(&plan[i]);}for (int i&#61;0 ; i//Start the timersdkResetTimer(&hTimer[0]);sdkStartTimer(&hTimer[0]);for (int i&#61;0; i//Main computationsMonteCarloGPU(&plan[i], streams[i]);checkCudaErrors(cudaEventRecord(events[i], streams[i]));}for (int i&#61;0; i//Stop the timersdkStopTimer(&hTimer[0]);for (int i&#61;0 ; i}///////////////////////////////////////////////////////////////////////////////
// Main program
///////////////////////////////////////////////////////////////////////////////
#define DO_CPU
#undef DO_CPU#define PRINT_RESULTS
#undef PRINT_RESULTSvoid usage()
{printf("--method&#61;[threaded,streamed] --scaling&#61;[strong,weak] [--help]\n");printf("Method&#61;threaded: 1 CPU thread for each GPU [default]\n");printf(" streamed: 1 CPU thread handles all GPUs (requires CUDA 4.0 or newer)\n");printf("Scaling&#61;strong : constant problem size\n");printf(" weak : problem size scales with number of available GPUs [default]\n");
}int main(int argc, char **argv)
{char *multiMethodChoice &#61; NULL;char *scalingChoice &#61; NULL;bool use_threads &#61; true;bool bqatest &#61; false;bool strongScaling &#61; false;pArgc &#61; &argc;pArgv &#61; argv;printf("%s Starting...\n\n", argv[0]);if (checkCmdLineFlag(argc, (const char **)argv, "qatest")){bqatest &#61; true;}getCmdLineArgumentString(argc, (const char **)argv, "method", &multiMethodChoice);getCmdLineArgumentString(argc, (const char **)argv, "scaling", &scalingChoice);if (checkCmdLineFlag(argc, (const char **)argv, "h") ||checkCmdLineFlag(argc, (const char **)argv, "help")){usage();exit(EXIT_SUCCESS);}if (multiMethodChoice &#61;&#61; NULL){use_threads &#61; false;}else{if (!strcasecmp(multiMethodChoice, "threaded")){use_threads &#61; true;}else{use_threads &#61; false;}}if (use_threads &#61;&#61; false){printf("Using single CPU thread for multiple GPUs\n");}if (scalingChoice &#61;&#61; NULL){strongScaling &#61; false;}else{if (!strcasecmp(scalingChoice, "strong")){strongScaling &#61; true;}else{strongScaling &#61; false;}}//GPU number present in the systemint GPU_N;checkCudaErrors(cudaGetDeviceCount(&GPU_N));int nOptions &#61; 8 * 1024;nOptions &#61; adjustProblemSize(GPU_N, nOptions);// select problem sizeint scale &#61; (strongScaling) ? 1 : GPU_N;int OPT_N &#61; nOptions * scale;int PATH_N &#61; 262144;// initialize the timershTimer &#61; new StopWatchInterface*[GPU_N];for (int i&#61;0; i//Input data arrayTOptionData *optionData &#61; new TOptionData[OPT_N];//Final GPU MC resultsTOptionValue *callValueGPU &#61; new TOptionValue[OPT_N];//"Theoretical" call values by Black-Scholes formulafloat *callValueBS &#61; new float[OPT_N];//Solver configTOptionPlan *optionSolver &#61; new TOptionPlan[GPU_N];//OS thread IDCUTThread *threadID &#61; new CUTThread[GPU_N];int gpuBase, gpuIndex;int i;float time;double delta, ref, sumDelta, sumRef, sumReserve;printf("MonteCarloMultiGPU\n");printf("&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;\n");printf("Parallelization method &#61; %s\n", use_threads ? "threaded" : "streamed");printf("Problem scaling &#61; %s\n", strongScaling? "strong" : "weak");printf("Number of GPUs &#61; %d\n", GPU_N);printf("Total number of options &#61; %d\n", OPT_N);printf("Number of paths &#61; %d\n", PATH_N);printf("main(): generating input data...\n");srand(123);for (i&#61;0; i 5.0f, 50.0f);optionData[i].X &#61; randFloat(10.0f, 25.0f);optionData[i].T &#61; randFloat(1.0f, 5.0f);optionData[i].R &#61; 0.06f;optionData[i].V &#61; 0.10f;callValueGPU[i].Expected &#61; -1.0f;callValueGPU[i].Confidence &#61; -1.0f;}printf("main(): starting %i host threads...\n", GPU_N);//Get option count for each GPUfor (i &#61; 0; i //Take into account cases with "odd" option countsfor (i &#61; 0; i <(OPT_N % GPU_N); i&#43;&#43;){optionSolver[i].optionCount&#43;&#43;;}//Assign GPU option rangesgpuBase &#61; 0;for (i &#61; 0; i if (use_threads || bqatest){//Start CPU thread for each GPUfor (gpuIndex &#61; 0; gpuIndex "main(): waiting for GPU results...\n");cutWaitForThreads(threadID, GPU_N);printf("main(): GPU statistics, threaded\n");for (i &#61; 0; i "GPU Device #%i: %s\n", optionSolver[i].device, deviceProp.name);printf("Options : %i\n", optionSolver[i].optionCount);printf("Simulation paths: %i\n", optionSolver[i].pathN);time &#61; sdkGetTimerValue(&hTimer[i]);printf("Total time (ms.): %f\n", time);printf("Options per sec.: %f\n", OPT_N / (time * 0.001));}printf("main(): comparing Monte Carlo and Black-Scholes results...\n");sumDelta &#61; 0;sumRef &#61; 0;sumReserve &#61; 0;for (i &#61; 0; i ref &#61; callValueBS[i];sumDelta &#43;&#61; delta;sumRef &#43;&#61; fabs(ref);if (delta > 1e-6){sumReserve &#43;&#61; callValueGPU[i].Confidence / delta;}#ifdef PRINT_RESULTSprintf("BS: %f; delta: %E\n", callValueBS[i], delta);
#endif}sumReserve /&#61; OPT_N;}if (!use_threads || bqatest){multiSolver(optionSolver, GPU_N);printf("main(): GPU statistics, streamed\n");for (i &#61; 0; i "GPU Device #%i: %s\n", optionSolver[i].device, deviceProp.name);printf("Options : %i\n", optionSolver[i].optionCount);printf("Simulation paths: %i\n", optionSolver[i].pathN);}time &#61; sdkGetTimerValue(&hTimer[0]);printf("\nTotal time (ms.): %f\n", time);printf("\tNote: This is elapsed time for all to compute.\n");printf("Options per sec.: %f\n", OPT_N / (time * 0.001));printf("main(): comparing Monte Carlo and Black-Scholes results...\n");sumDelta &#61; 0;sumRef &#61; 0;sumReserve &#61; 0;for (i &#61; 0; i ref &#61; callValueBS[i];sumDelta &#43;&#61; delta;sumRef &#43;&#61; fabs(ref);if (delta > 1e-6){sumReserve &#43;&#61; callValueGPU[i].Confidence / delta;}#ifdef PRINT_RESULTSprintf("BS: %f; delta: %E\n", callValueBS[i], delta);
#endif}sumReserve /&#61; OPT_N;}#ifdef DO_CPUprintf("main(): running CPU MonteCarlo...\n");TOptionValue callValueCPU;sumDelta &#61; 0;sumRef &#61; 0;for (i &#61; 0; i ref &#61; callValueCPU.Expected;sumDelta &#43;&#61; delta;sumRef &#43;&#61; fabs(ref);printf("Exp : %f | %f\t", callValueCPU.Expected, callValueGPU[i].Expected);printf("Conf: %f | %f\n", callValueCPU.Confidence, callValueGPU[i].Confidence);}printf("L1 norm: %E\n", sumDelta / sumRef);
#endifprintf("Shutting down...\n");for (int i&#61;0; i"Test Summary...\n");printf("L1 norm : %E\n", sumDelta / sumRef);printf("Average reserve: %f\n", sumReserve);printf("\nNOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.\n\n");printf(sumReserve > 1.0f ? "Test passed\n" : "Test failed!\n");exit(sumReserve > 1.0f ? EXIT_SUCCESS : EXIT_FAILURE);
}


推荐阅读
  • 在Linux系统上构建Web服务器的详细步骤
    本文详细介绍了如何在Linux系统上搭建Web服务器的过程,包括安装Apache、PHP和MySQL等关键组件,以及遇到的一些常见问题及其解决方案。 ... [详细]
  • 在创建新的Android项目时,您可能会遇到aapt错误,提示无法打开libstdc++.so.6共享对象文件。本文将探讨该问题的原因及解决方案。 ... [详细]
  • 嵌入式开发环境搭建与文件传输指南
    本文详细介绍了如何为嵌入式应用开发搭建必要的软硬件环境,并提供了通过串口和网线两种方式将文件传输到开发板的具体步骤。适合Linux开发初学者参考。 ... [详细]
  • 搭建Jenkins、Ant与TestNG集成环境
    本文详细介绍了如何在Ubuntu 16.04系统上配置Jenkins、Ant和TestNG的集成开发环境,涵盖从安装到配置的具体步骤,并提供了创建Windows Slave节点及项目构建的指南。 ... [详细]
  • 选择适合生产环境的Docker存储驱动
    本文旨在探讨如何在生产环境中选择合适的Docker存储驱动,并详细介绍不同Linux发行版下的配置方法。通过参考官方文档和兼容性矩阵,提供实用的操作指南。 ... [详细]
  • CentOS系统安装与配置常见问题及解决方案
    本文详细介绍了在CentOS系统安装过程中遇到的常见问题及其解决方案,包括Vi编辑器的操作、图形界面的安装、网络连接故障排除等。通过本文,读者可以更好地理解和解决这些常见问题。 ... [详细]
  • CentOS 6.5 上安装 MySQL 5.7.23 的详细步骤
    本文详细介绍如何在 CentOS 6.5 系统上成功安装 MySQL 5.7.23,包括卸载旧版本、下载安装包、配置文件修改及启动服务等关键步骤。 ... [详细]
  • Nginx 反向代理与负载均衡实验
    本实验旨在通过配置 Nginx 实现反向代理和负载均衡,确保从北京本地代理服务器访问上海的 Web 服务器时,能够依次显示红、黄、绿三种颜色页面以验证负载均衡效果。 ... [详细]
  • 本文提供了在 Kali Linux 2020.01 x64 版本上安装 Docker 的详细步骤,包括环境准备、使用清华大学镜像源、配置 APT 仓库以及安装过程中的常见问题处理。 ... [详细]
  • Ubuntu GamePack:专为游戏爱好者打造的Linux发行版
    随着Linux系统在游戏领域的应用越来越广泛,许多Linux用户开始寻求在自己的系统上畅玩游戏的方法。UALinux,一家致力于推广GNU/Linux使用的乌克兰公司,推出了基于Ubuntu 16.04的Ubuntu GamePack,旨在为Linux用户提供一个游戏友好型的操作环境。 ... [详细]
  • 在安装Ubuntu 12.10并尝试安装VMware Tools时,遇到了一个常见的错误提示:指定的路径不是有效的3.5.0-17-generic内核头文件路径。本文将提供解决这一问题的具体步骤。 ... [详细]
  • 本文详细介绍了如何规划和部署一个高可用的Etcd集群,包括主机配置、软件安装、防火墙设置及集群健康检查等内容。通过合理的硬件配置和网络规划,确保Etcd集群在生产环境中的稳定运行。 ... [详细]
  • Python第三方库安装的多种途径及注意事项
    本文详细介绍了Python第三方库的几种常见安装方法,包括使用pip命令、集成开发环境(如Anaconda)以及手动文件安装,并提供了每种方法的具体操作步骤和适用场景。 ... [详细]
  • 本文介绍如何在Linux系统中卸载预装的OpenJDK,安装指定版本的JDK 1.8,并配置防火墙以确保系统安全性和软件兼容性。 ... [详细]
  • WinSCP: 跨Windows与Linux系统的高效文件传输解决方案
    本文详细介绍了一款名为WinSCP的开源图形化SFTP客户端,该工具支持SSH协议,适用于Windows操作系统,能够实现与Linux系统之间的文件传输。对于从事嵌入式开发的技术人员来说,掌握WinSCP的使用方法将极大提高工作效率。 ... [详细]
author-avatar
帝薩克斯_271
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有