热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

深度学习之Caffe完全掌握:配置GPU驱动安装cuda

深度学习之Caffe完全掌握:配置GPU驱动安装cuda安装Nvidia驱动其他博客一般是要求安装你的电脑对应的版本,我直接装了nvidia-367&

深度学习之Caffe完全掌握:配置GPU驱动安装cuda

这里写图片描述



安装Nvidia驱动

其他博客一般是要求安装你的电脑对应的版本,我直接装了nvidia-367(为了配合cuda8.0),也可以用。
你也可以参照:http://blog.csdn.net/xuzhongxiong/article/details/52717285

root@master# sudo add-apt-repository ppa:xorg-edgers/ppa
root@master# sudo apt-get update
root@master# sudo apt-get install nvidia-367
root@master# sudo apt-get install mesa-common-dev
root@master# sudo apt-get install freeglut3-dev
root@master# nvidia-smi
Sun Feb 11 11:18:43 2018
+-----------------------------------------------------------------------------+

| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 920M Off | 00000000:01:00.0 N/A | N/A |
| N/A 41C P5 N/A / N/A | 129MiB / 2004MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+

安装就成功了



安装cuda8.0

https://developer.nvidia.com/cuda-toolkit
一定要选择8.0,一共1.4G左右。
会有非常长的接受许可信息,一直ENTER直到输入accept。驱动之前已经安装,这里就不要选择安装驱动。其余的都直接默认或者选择是即可。
使用:

root@master# sudo sh cuda_8.0.27_linux.run
root@master# vim /etc/profile
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64
root@master# source /etc/profile
root@master#
root@master#



测试cuda的Samples

root@master# cd /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery
root@master# sudo make
root@master# ./deviceQueryCUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: "GeForce 920M"CUDA Driver Version / Runtime Version 9.0 / 8.0CUDA Capability Major/Minor version number: 3.5Total amount of global memory: 2004 MBytes (2101542912 bytes)( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA CoresGPU Max Clock rate: 954 MHz (0.95 GHz)Memory Clock rate: 900 MhzMemory Bus Width: 64-bitL2 Cache Size: 524288 bytesMaximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layersMaximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layersTotal amount of constant memory: 65536 bytesTotal amount of shared memory per block: 49152 bytesTotal number of registers available per block: 65536Warp size: 32Maximum number of threads per multiprocessor: 2048Maximum number of threads per block: 1024Max dimension size of a thread block (x,y,z): (1024, 1024, 64)Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)Maximum memory pitch: 2147483647 bytesTexture alignment: 512 bytesConcurrent copy and kernel execution: Yes with 1 copy engine(s)Run time limit on kernels: YesIntegrated GPU sharing Host Memory: NoSupport host page-locked memory mapping: YesAlignment requirement for Surfaces: YesDevice has ECC support: DisabledDevice supports Unified Addressing (UVA): YesDevice PCI Domain ID / Bus ID / location ID: 0 / 1 / 0Compute Mode:with device simultaneously) >deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce 920M
Result = PASS



sample下的蒙特卡罗模拟程序

cuda的编译语句

root@master# "/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64
-gencode arch=compute_20,code=sm_20
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_37,code=sm_37
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_52,code=sm_52
-gencode arch=compute_60,code=sm_60
-gencode arch=compute_60,code=compute_60
-o MonteCarloMultiGPU.o -c MonteCarloMultiGPU.cpp

程序

#include
#include
#include
#include
#include // includes, project
#include // Helper functions (utilities, parsing, timing)
#include // helper functions (cuda error checking and initialization)
#include #include "MonteCarlo_common.h"int *pArgc = NULL;
char **pArgv = NULL;#ifdef WIN32
#define strcasecmp _strcmpi
#endif////////////////////////////////////////////////////////////////////////////////
// Common functions
////////////////////////////////////////////////////////////////////////////////
float randFloat(float low, float high)
{float t = (float)rand() / (float)RAND_MAX;return (1.0f - t) * low + t * high;
}/// Utility function to tweak problem size for small GPUs
int adjustProblemSize(int GPU_N, int default_nOptions)
{int nOptions &#61; default_nOptions;// select problem sizefor (int i&#61;0; iint cudaCores &#61; _ConvertSMVer2Cores(deviceProp.major, deviceProp.minor)* deviceProp.multiProcessorCount;if (cudaCores <&#61; 32){nOptions &#61; (nOptions 2 ? nOptions : cudaCores/2);}}return nOptions;
}int adjustGridSize(int GPUIndex, int defaultGridSize)
{cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, GPUIndex));int maxGridSize &#61; deviceProp.multiProcessorCount * 40;return ((defaultGridSize > maxGridSize) ? maxGridSize : defaultGridSize);
}///////////////////////////////////////////////////////////////////////////////
// CPU reference functions
///////////////////////////////////////////////////////////////////////////////
extern "C" void MonteCarloCPU(TOptionValue &callValue,TOptionData optionData,float *h_Random,int pathN
);//Black-Scholes formula for call options
extern "C" void BlackScholesCall(float &CallResult,TOptionData optionData
);////////////////////////////////////////////////////////////////////////////////
// GPU-driving host thread
////////////////////////////////////////////////////////////////////////////////
//Timer
StopWatchInterface **hTimer &#61; NULL;static CUT_THREADPROC solverThread(TOptionPlan *plan)
{//Init GPUcheckCudaErrors(cudaSetDevice(plan->device));cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, plan->device));//Start the timersdkStartTimer(&hTimer[plan->device]);// Allocate intermediate memory for MC integrator and initialize// RNG statesinitMonteCarloGPU(plan);// Main computationMonteCarloGPU(plan);checkCudaErrors(cudaDeviceSynchronize());//Stop the timersdkStopTimer(&hTimer[plan->device]);//Shut down this GPUcloseMonteCarloGPU(plan);cudaStreamSynchronize(0);printf("solverThread() finished - GPU Device %d: %s\n", plan->device, deviceProp.name);CUT_THREADEND;
}static void multiSolver(TOptionPlan *plan, int nPlans)
{// allocate and initialize an array of stream handlescudaStream_t *streams &#61; (cudaStream_t *) malloc(nPlans * sizeof(cudaStream_t));cudaEvent_t *events &#61; (cudaEvent_t *)malloc(nPlans * sizeof(cudaEvent_t));for (int i &#61; 0; i //Init Each GPU// In CUDA 4.0 we can call cudaSetDevice multiple times to target each device// Set the device desired, then perform initializations on that devicefor (int i&#61;0 ; i// set the target device to perform initialization oncheckCudaErrors(cudaSetDevice(plan[i].device));cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, plan[i].device));// Allocate intermediate memory for MC integrator// and initialize RNG stateinitMonteCarloGPU(&plan[i]);}for (int i&#61;0 ; i//Start the timersdkResetTimer(&hTimer[0]);sdkStartTimer(&hTimer[0]);for (int i&#61;0; i//Main computationsMonteCarloGPU(&plan[i], streams[i]);checkCudaErrors(cudaEventRecord(events[i], streams[i]));}for (int i&#61;0; i//Stop the timersdkStopTimer(&hTimer[0]);for (int i&#61;0 ; i}///////////////////////////////////////////////////////////////////////////////
// Main program
///////////////////////////////////////////////////////////////////////////////
#define DO_CPU
#undef DO_CPU#define PRINT_RESULTS
#undef PRINT_RESULTSvoid usage()
{printf("--method&#61;[threaded,streamed] --scaling&#61;[strong,weak] [--help]\n");printf("Method&#61;threaded: 1 CPU thread for each GPU [default]\n");printf(" streamed: 1 CPU thread handles all GPUs (requires CUDA 4.0 or newer)\n");printf("Scaling&#61;strong : constant problem size\n");printf(" weak : problem size scales with number of available GPUs [default]\n");
}int main(int argc, char **argv)
{char *multiMethodChoice &#61; NULL;char *scalingChoice &#61; NULL;bool use_threads &#61; true;bool bqatest &#61; false;bool strongScaling &#61; false;pArgc &#61; &argc;pArgv &#61; argv;printf("%s Starting...\n\n", argv[0]);if (checkCmdLineFlag(argc, (const char **)argv, "qatest")){bqatest &#61; true;}getCmdLineArgumentString(argc, (const char **)argv, "method", &multiMethodChoice);getCmdLineArgumentString(argc, (const char **)argv, "scaling", &scalingChoice);if (checkCmdLineFlag(argc, (const char **)argv, "h") ||checkCmdLineFlag(argc, (const char **)argv, "help")){usage();exit(EXIT_SUCCESS);}if (multiMethodChoice &#61;&#61; NULL){use_threads &#61; false;}else{if (!strcasecmp(multiMethodChoice, "threaded")){use_threads &#61; true;}else{use_threads &#61; false;}}if (use_threads &#61;&#61; false){printf("Using single CPU thread for multiple GPUs\n");}if (scalingChoice &#61;&#61; NULL){strongScaling &#61; false;}else{if (!strcasecmp(scalingChoice, "strong")){strongScaling &#61; true;}else{strongScaling &#61; false;}}//GPU number present in the systemint GPU_N;checkCudaErrors(cudaGetDeviceCount(&GPU_N));int nOptions &#61; 8 * 1024;nOptions &#61; adjustProblemSize(GPU_N, nOptions);// select problem sizeint scale &#61; (strongScaling) ? 1 : GPU_N;int OPT_N &#61; nOptions * scale;int PATH_N &#61; 262144;// initialize the timershTimer &#61; new StopWatchInterface*[GPU_N];for (int i&#61;0; i//Input data arrayTOptionData *optionData &#61; new TOptionData[OPT_N];//Final GPU MC resultsTOptionValue *callValueGPU &#61; new TOptionValue[OPT_N];//"Theoretical" call values by Black-Scholes formulafloat *callValueBS &#61; new float[OPT_N];//Solver configTOptionPlan *optionSolver &#61; new TOptionPlan[GPU_N];//OS thread IDCUTThread *threadID &#61; new CUTThread[GPU_N];int gpuBase, gpuIndex;int i;float time;double delta, ref, sumDelta, sumRef, sumReserve;printf("MonteCarloMultiGPU\n");printf("&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;\n");printf("Parallelization method &#61; %s\n", use_threads ? "threaded" : "streamed");printf("Problem scaling &#61; %s\n", strongScaling? "strong" : "weak");printf("Number of GPUs &#61; %d\n", GPU_N);printf("Total number of options &#61; %d\n", OPT_N);printf("Number of paths &#61; %d\n", PATH_N);printf("main(): generating input data...\n");srand(123);for (i&#61;0; i 5.0f, 50.0f);optionData[i].X &#61; randFloat(10.0f, 25.0f);optionData[i].T &#61; randFloat(1.0f, 5.0f);optionData[i].R &#61; 0.06f;optionData[i].V &#61; 0.10f;callValueGPU[i].Expected &#61; -1.0f;callValueGPU[i].Confidence &#61; -1.0f;}printf("main(): starting %i host threads...\n", GPU_N);//Get option count for each GPUfor (i &#61; 0; i //Take into account cases with "odd" option countsfor (i &#61; 0; i <(OPT_N % GPU_N); i&#43;&#43;){optionSolver[i].optionCount&#43;&#43;;}//Assign GPU option rangesgpuBase &#61; 0;for (i &#61; 0; i if (use_threads || bqatest){//Start CPU thread for each GPUfor (gpuIndex &#61; 0; gpuIndex "main(): waiting for GPU results...\n");cutWaitForThreads(threadID, GPU_N);printf("main(): GPU statistics, threaded\n");for (i &#61; 0; i "GPU Device #%i: %s\n", optionSolver[i].device, deviceProp.name);printf("Options : %i\n", optionSolver[i].optionCount);printf("Simulation paths: %i\n", optionSolver[i].pathN);time &#61; sdkGetTimerValue(&hTimer[i]);printf("Total time (ms.): %f\n", time);printf("Options per sec.: %f\n", OPT_N / (time * 0.001));}printf("main(): comparing Monte Carlo and Black-Scholes results...\n");sumDelta &#61; 0;sumRef &#61; 0;sumReserve &#61; 0;for (i &#61; 0; i ref &#61; callValueBS[i];sumDelta &#43;&#61; delta;sumRef &#43;&#61; fabs(ref);if (delta > 1e-6){sumReserve &#43;&#61; callValueGPU[i].Confidence / delta;}#ifdef PRINT_RESULTSprintf("BS: %f; delta: %E\n", callValueBS[i], delta);
#endif}sumReserve /&#61; OPT_N;}if (!use_threads || bqatest){multiSolver(optionSolver, GPU_N);printf("main(): GPU statistics, streamed\n");for (i &#61; 0; i "GPU Device #%i: %s\n", optionSolver[i].device, deviceProp.name);printf("Options : %i\n", optionSolver[i].optionCount);printf("Simulation paths: %i\n", optionSolver[i].pathN);}time &#61; sdkGetTimerValue(&hTimer[0]);printf("\nTotal time (ms.): %f\n", time);printf("\tNote: This is elapsed time for all to compute.\n");printf("Options per sec.: %f\n", OPT_N / (time * 0.001));printf("main(): comparing Monte Carlo and Black-Scholes results...\n");sumDelta &#61; 0;sumRef &#61; 0;sumReserve &#61; 0;for (i &#61; 0; i ref &#61; callValueBS[i];sumDelta &#43;&#61; delta;sumRef &#43;&#61; fabs(ref);if (delta > 1e-6){sumReserve &#43;&#61; callValueGPU[i].Confidence / delta;}#ifdef PRINT_RESULTSprintf("BS: %f; delta: %E\n", callValueBS[i], delta);
#endif}sumReserve /&#61; OPT_N;}#ifdef DO_CPUprintf("main(): running CPU MonteCarlo...\n");TOptionValue callValueCPU;sumDelta &#61; 0;sumRef &#61; 0;for (i &#61; 0; i ref &#61; callValueCPU.Expected;sumDelta &#43;&#61; delta;sumRef &#43;&#61; fabs(ref);printf("Exp : %f | %f\t", callValueCPU.Expected, callValueGPU[i].Expected);printf("Conf: %f | %f\n", callValueCPU.Confidence, callValueGPU[i].Confidence);}printf("L1 norm: %E\n", sumDelta / sumRef);
#endifprintf("Shutting down...\n");for (int i&#61;0; i"Test Summary...\n");printf("L1 norm : %E\n", sumDelta / sumRef);printf("Average reserve: %f\n", sumReserve);printf("\nNOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.\n\n");printf(sumReserve > 1.0f ? "Test passed\n" : "Test failed!\n");exit(sumReserve > 1.0f ? EXIT_SUCCESS : EXIT_FAILURE);
}


推荐阅读
  • 在Ubuntu系统中安装Android SDK的详细步骤及解决“Failed to fetch URL https://dlssl.google.com/”错误的方法
    在Ubuntu 11.10 x64系统中安装Android SDK的详细步骤,包括配置环境变量和解决“Failed to fetch URL https://dlssl.google.com/”错误的方法。本文详细介绍了如何在该系统上顺利安装并配置Android SDK,确保开发环境的稳定性和高效性。此外,还提供了解决网络连接问题的实用技巧,帮助用户克服常见的安装障碍。 ... [详细]
  • Ubuntu系统下的GIF动画录制解决方案
    在撰写文章或教程时,GIF动态图能够有效地传达信息。对于Windows用户而言,ScreenToGif是一款非常实用的工具。而在Ubuntu系统中,用户同样拥有多种选择来创建GIF动画,本文将重点介绍两款录屏工具——Byzanz和Peek。 ... [详细]
  • 本文详细介绍了如何在 Ubuntu 14.04 系统上搭建仅使用 CPU 的 Caffe 深度学习框架,包括环境准备、依赖安装及编译过程。 ... [详细]
  • 利用Node.js实现PSD文件的高效切图
    本文介绍了如何通过Node.js及其psd2json模块,快速实现PSD文件的自动化切图过程,以适应项目中频繁的界面更新需求。此方法不仅提高了工作效率,还简化了从设计稿到实际应用的转换流程。 ... [详细]
  • 在使用 PyInstaller 将 Python 应用程序打包成独立的可执行文件时,若项目中包含动态加载的库或插件,需要正确配置 --hidden-import 和 --add-binary 参数,以确保所有依赖项均能被正确识别和打包。 ... [详细]
  • WebBenchmark:强大的Web API性能测试工具
    本文介绍了一款名为WebBenchmark的Web API性能测试工具,该工具不仅支持HTTP和HTTPS服务的测试,还提供了丰富的功能来帮助开发者进行高效的性能评估。 ... [详细]
  • ArcBlock 发布 ABT 节点 1.0.31 版本更新
    2020年11月9日,ArcBlock 区块链基础平台发布了 ABT 节点开发平台的1.0.31版本更新,此次更新带来了多项功能增强与性能优化。 ... [详细]
  • 本文探讨了异步编程的发展历程,从最初的AJAX异步回调到现代的Promise、Generator+Co以及Async/Await等技术。文章详细分析了Promise的工作原理及其源码实现,帮助开发者更好地理解和使用这一重要工具。 ... [详细]
  • 尽管在WPF中工作了一段时间,但在菜单控件的样式设置上遇到了一些基础问题,特别是关于如何正确配置前景色和背景色。 ... [详细]
  • MITM(中间人攻击)原理及防范初探(二)
    上一篇文章MITM(中间人攻击)原理及防范初探(一)给大家介绍了利用ettercap进行arp欺骗及劫持明文口令,后来我发现好友rootoorotor的文章介绍比我写的更透彻,所以基础利用大家可以参看 ... [详细]
  • 本文详细介绍了 InfluxDB、collectd 和 Grafana 的安装与配置流程。首先,按照启动顺序依次安装并配置 InfluxDB、collectd 和 Grafana。InfluxDB 作为时序数据库,用于存储时间序列数据;collectd 负责数据的采集与传输;Grafana 则用于数据的可视化展示。文中提供了 collectd 的官方文档链接,便于用户参考和进一步了解其配置选项。通过本指南,读者可以轻松搭建一个高效的数据监控系统。 ... [详细]
  • 在安装 iOS 开发所需的 CocoaPods 时,用户可能会遇到多种问题。其中一个常见问题是,在执行 `pod setup` 命令后,系统无法连接到 GitHub 以更新 CocoaPods/Specs 仓库。这可能是由于网络连接不稳定、GitHub 服务器暂时不可用或本地配置错误等原因导致。为解决此问题,建议检查网络连接、确保 GitHub API 限制未被触发,并验证本地配置文件是否正确。 ... [详细]
  • 本文深入探讨了如何利用Maven高效管理项目中的外部依赖库。通过介绍Maven的官方依赖搜索地址(),详细讲解了依赖库的添加、版本管理和冲突解决等关键操作。此外,还提供了实用的配置示例和最佳实践,帮助开发者优化项目构建流程,提高开发效率。 ... [详细]
  • SSL 错误:目标主机名与备用证书主题名称不匹配
    在使用 `git clone` 命令时,常见的 SSL 错误表现为:无法访问指定的 HTTPS 地址(如 `https://ip_or_domain/xxxx.git`),原因是目标主机名与备用证书主题名称不匹配。这通常是因为服务器的 SSL 证书配置不正确或客户端的证书验证设置有问题。建议检查服务器的 SSL 证书配置,确保其包含正确的主机名,并确认客户端的证书信任库已更新。此外,可以通过临时禁用 SSL 验证来排查问题,但请注意这会降低安全性。 ... [详细]
  • Spring框架中的面向切面编程(AOP)技术详解
    面向切面编程(AOP)是Spring框架中的关键技术之一,它通过将横切关注点从业务逻辑中分离出来,实现了代码的模块化和重用。AOP的核心思想是将程序运行过程中需要多次处理的功能(如日志记录、事务管理等)封装成独立的模块,即切面,并在特定的连接点(如方法调用)动态地应用这些切面。这种方式不仅提高了代码的可维护性和可读性,还简化了业务逻辑的实现。Spring AOP利用代理机制,在不修改原有代码的基础上,实现了对目标对象的增强。 ... [详细]
author-avatar
帝薩克斯_271
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有