Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

Limited by VRAM is a huge constraint for me. Even if it is slower, being able to load 100GB+ into RAM without any batching headaches is worth a lot.

Unless cudf has implemented some clever dask+cudf kind of situation which can intelligently push data in/out of GPU as required?



There is dask cudf which gets a lot of the way there.

https://docs.rapids.ai/api/dask-cudf/stable/


Last time I’ve used it, Dask was a lot worse than simple manual batching.


This is huge, this was my only gripe with cudf!


I did a conversion of 500GB of data using dask_cudf on a GTX 1060 with 6GB of VRAM and was able to do it faster than a 20 node m3.xlarge Cluster.

What you can do on even consumer GPU's is mind blowing.


How does it perform when it comes to plotting these large data points? Can I use matplotlib?


Correct me if I'm wrong, but with recent GPUs you should just be able to memory map a file on "disk" to the GPU address space?

Open, mmap and then import the host pointer to GPU API (Cuda or Vulkan). Paging should work as expected, but "stupid" access patterns hurt throughput.

This requires a GPU, a CPU and a kernel driver that supports it. E.g. my laptop Intel GPU has the host pointer import Vulkan API but last I checked it does not support pointers imported from mmap.


There's a profiler cell magic for Notebooks which helps identify if you run out of VRAM (it says what runs on CPU and GPU). There's an open PR to turn on low-VRAM reporting as a diagnostic. CuDF is impressive, but getting a working setup can be a PITA (and then if you upgrade libraries...). Personality I think it fits in the production pipeline for obvious bottlenecks on well tended configurations, using it in the r&d flow might cost diagnostic time getting and keeping it working (YMMV etc)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: