Comfyui speed up github. This provides more context for the sampling.


Comfyui speed up github And I have been updating since to check if there is any change. Better compatibility with third-party checkpoints (we will continuously collect compatible free third Hi, The torch compile for this only works if you have version torch v1. The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface. - Speed up TAESD preview. · comfyanonymous/ComfyUI@ae197f6 Since yesterday using flux-dev causes my OS to lag and stutter permanently and the loading of all models in the workflow is extremely slow. GitHub is where comfyui builds software. I know this will change over time, and hopefully quite quickly, but for the moment, certainly on older hardware, ComfyUI is the better option for SDXL work. FWIW, I always use a batch size of 3 as batching offers a reasonable speed boost. Sign up for GitHub By clicking “Sign up for GitHub”, you agree to our terms of regardless of which upscale model - the speed loss for cpu off loading is because of transfer of data back and forth, aswel as read/write operations. Its features include: a. g. I am having a problem with very slow generation speed when using AutoCFG. The only way to fix it was to rollback ComfyUI but that rollback broke other custom nodes. Note FreeU and PatchModelAddDownscale are now supported experimentally, Just use the comfy node normally. - 1038lab/ComfyUI-RMBG cd ComfyUI/custom_nodes git clone https: Good balance between speed and accuracy; Effective on both simple and complex scenes; I have an updated ComfyUI setup with a 6GB GTX 1660 Super, and the speed is exactly the same in every generation. ai. ComfyUI's ControlNet Auxiliary Preprocessors. Notifications You must be signed in to change notification settings; New issue Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. It allows you to iteratively change the blocks weights of Flux models and check the difference each value makes The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. ***************************************************** "bitsandbytes_NF4" custom Up to 28. This provides more context for the sampling. difference in launch speed of ComfyUI in Local & Cloud Service. With four LORA, the speed drops x3. How to control video The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. I've been using ComfyUI for a year or so, and something happened that has caused it to dramatically slow down image generation. main Hi, I did several tests with clean installation and perfectly configured env. A100 didn't support the fp8 types and presumably at some point TransformerEngine will get ported to Windows / The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. You switched accounts on another tab or window. · comfyanonymous/ComfyUI@58c9838 Write better code with AI Code review Follow the ComfyUI manual installation instructions for Windows and Linux. Contribute to Fannovel16/comfyui_controlnet_aux development by creating an account on GitHub. 25 votes, 14 comments. 5% faster generation speed than normal; Negative weighting; 05. With the arrival of Flux, even 24gb cards are maxed out and models have to be swapped in and out in the image creation process, which is slow. Perhaps you need to update Forge, Comfyui, and all extensions to the latest version. Steps to Reproduce Used a default SDXL workflow with lora Debug This custom "node" for ComfyUI enables pan navigation of the canvas using the arrow keys, with a customizable pan speed in ComfyUI's Settings, under the "codecringebinge" subsection of the Settings Dialog's left panel how to increase speed GGUF model? GGUF model 4 step one image generated time 34 second 6gb model but unet 6gb model generated time 18/19 second city96 / ComfyUI-GGUF Public. 9-8it/s model size ~1. Iteration ComfyUI-Workflows-Speedup. the area for the sampling) around the original mask, in pixels. Saved searches Use saved searches to filter your results more quickly Using these arguments : --use-pytorch-cross-attention --fast --highvram --dont-upcast-attention Eveything up to date, comfyUI + dependencies. · comfyanonymous/ComfyUI@ae197f6 stuck issue sample. 32s (37. To see the GUI go to: http:/ Keybind Explanation; Ctrl + Enter: Queue up current graph for generation: Ctrl + Shift + Enter: Queue up current graph as first for generation: Ctrl + Alt + Enter Why would two version of same program run at such different speeds? I looked at the how Automatic1111 starts up. com/gameltb/ComfyUI_stable_fast. The main disadvantage compared to the alternatives I mentioned is that it is relatively slow and VRAM hungry since it requires multiple iterations at high res while Deep Shrink/HiDiffusion actually speed up generation while the scaling effect is active. 7 gb 60% speed increase. Sign in comfyui. Navigation Menu Toggle navigation You can also try setting this env variable PYTORCH_TUNABLEOP_ENABLED=1 which might speed things up at the cost of a very slow initial run. Hey, i've been trying to run ComfyUI with Flux for the past few weeks and it seems that no matter what I try and which Flux model I try (dev/schnell), I always get very blurred images as seen in the attached image, no matter the parameters I change like the sampler others. Reload to refresh your session. · comfyanonymous/ComfyUI@4ee9aad I can confirm, everything false still sees extremely slow save speed. Topics Trending ComfyUI nodes to crop before sampling and stitch back after sampling that speed up inpainting comfyorg/comfyui-crop-and-stitch’s past year of commit activity. Should I "enable" the extension somehow? I only did git clone it into the custom_nodes folder. 12/08/2024 Added HelloMemeV2 (select "v2" in the version option of the LoadHelloMemeImage/Video Node). float16, torch. The FLUX model took a long time to load, but I was able to fix it. Added "no uncond" node which completely disable the negative and doubles the speed while rescaling the latent space in the post-cfg function up until the sigmas are at 1 (or really 8GB of ram is barely enough for Windows to run smoothly TBH, and the low vram / model swapping won't help. If you get an error: update your ComfyUI; 15. · comfyanonymous/ComfyUI@ae197f6 Convert Model using stable-fast (Estimated speed up: 2X) Train a LCM Lora for denoise unet (Estimated speed up: 5X) Training a new Model using better dataset to improve results quality (Optional, we'll see if there is any need for me to do it ;) Continuous research, always moving towards something better & faster🚀 Can be added after any node to clean up vram and memory - T8star1984/comfyui-purgevram kijai / ComfyUI-HunyuanVideoWrapper Public. Contribute to ccssu/ComfyUI-Workflows-Speedup development by creating an account on GitHub. Speed up the loading of checkpoints with ComfyUI. 0 flows, but sdxl loads the checkpoints, take up about 19GB vram, then pushes to 24 GB vram upon running a prompt, once the prompt finishes, (or if I cancel it) it just sits at 24 GB until I close out the comfyui command prompt Saved searches Use saved searches to filter your results more quickly context_expand_pixels: how much to grow the context area (i. 1. "flux1-dev-bnb-nf4" is a new Flux model that is nearly 4 times faster than the Flux Dev version and 3 times faster than the Flux Schnell version. 7 s/it for batch size 1, with your parameters 12/17/2024 Support modelscope (Modelscope Demo). e. 5 model (realisticvisionV51) resolution 512x768 base speed 5it/s model size ~4. If you have two gpus this would be a massive s T-GATE could brings 10%-50% speed up for different diffusion models, only slightly reduces the quality of the generated images and maintains the original composition. A bit ago I tried saving in batches asynchronously and then changing the date metadata post-save so everything was in their correct order, but couldn't get the filename/date stuff right and gave up. . Python 0 GPL CFG the temperature of your oven: this is a thermostat that ensures it is always cooked like you want. the area for the sampling) around the original mask, as a factor, e. My assumption was the filename prefix loop or the repeated regex. But with two or more, the speed drops several times. js; Search for scale *= 1. comfyanonymous / ComfyUI Public. Find and fix The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. Some monkey patch is used for current implementation. It should be at least as fast as the a1111 ui if you do that. You can also try setting this env variable PYTORCH_TUNABLEOP_ENABLED=1 which might speed things up at the cost of a very slow initial run. You signed out in another tab or window. It can be done without any loss in quality when the sigma are low enough (~1). bat file with notepad, make your changes, There has been a number of big changes to the ComfyUI core recently which should improve performance across the board but there might still be some bugs that slow After installing the beta version of desktop ComfyUI, I’ve started testing the performance. sd1. Skip to content. Notifications You must be signed in to change New issue Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. 1 gb tensorrt static speed 8. Sign up for GitHub By clicking “Sign up for GitHub”, you agree 4lt3r3go opened this issue Dec 20, 2024 · 0 comments Open speed . 7 seconds in auto1111 with 512x512 20 steps euler comfy gets me 3 seconds to do same image with same settings, thats half the speed, and its pretty big slowdown from auto1111 Any chance t I'll preface by saying that I updated the GGUF loader and ComfyUI at the same time, so I'm not 100% sure which is to blame. I tested that the speed of Forge, Comfyui, speed no changed. 5 for a faster zooming in speed, or use a smaller number like 1. ComfyUI Flux Accelerator can generate images up to 37. · comfyanonymous/ComfyUI@58c9838 Thanks to city96 for active development of the node. You can InstantIR to upsacel image in ComfyUI ,InstantIR,Blind Image Restoration with Instant Generative Reference - smthemex/ComfyUI_InstantIR_Wrapper I recently ran into an issue where a ComfyUI change broke a custom node. 04. Note that --force-fp16 will only work if you installed the latest pytorch nightly. It's not obvious but hypertiling is an attention optimization that improves on xformers / etc. Actual Behavior The inference speed is about 20% slower and VRAM usage is lower as well. I'll try in in ComfyUI later, once I set up the refiner workflow, which I've yet to do. Here are some examples (tested on RTX 4090): 512x512 4steps: 0. Sign in Product The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. 1; Replace the 1. I Hi, is there a chance to speed up the installation process? Unfortunately the environment uses only one cpu core for the pip install process, which can take a long time (up to 2 hours) depending on the instance of vast. The problem was solved after the last update, at least on Q8. ; fill_mask_holes: Whether to fully fill any I have to knobble too many features to get it working, and the speed is way to slow. This has a very slight hit on inference speed and zero hit on memory use, initial tests indicate it's absolutely worth using. Sign in Sign up for a free GitHub account to open an issue and contact its maintainers and the community. Also adds a 30% speed increase. The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. in my GPU cloud service, it takes ~40 seconds to launch ComfyUI. Contribute to nonnonstop/comfyui-faster-loading development by creating an account on GitHub. 13 since we are using other things in comfy that require torch 2 it is not possible to activate the speed up is not that great anyway so it is better to deactivate torch compile. 24. · comfyanonymous/ComfyUI@e0c0029 The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface. Notes Only parts of the graph that have an output with all the correct inputs will be executed. Sign in Product GitHub Copilot. Out of curiosity I disabled xformers and used Pytorch Cross attention expecting a total collapse in performance but instead the speed turned out to be the same. Install the ComfyUI dependencies. Notifications You must be signed in to change notification settings; Fork New issue Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. · comfyanonymous/ComfyUI@4ee9aad I'd also suggest trying out the "HyperTiling" node under _nodes_for_testing. Write better code with AI Security You can also try setting this env variable You can also try setting this env variable PYTORCH_TUNABLEOP_ENABLED=1 which might speed things up at the cost of a very slow initial run. Sign up for GitHub By clicking “Sign up for GitHub ”, you agree to our Sign in to your account Jump to bottom. A ComfyUI custom node designed for advanced image background removal and object segmentation, utilizing multiple models including RMBG-2. I update comfyUI on a daily basis. When using one LORA, I didnt notice a drop in speed (Q8). · comfyanonymous/ComfyUI@ae197f6 The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface. The image below comes from ComfyUI and contains the nodes used, for anyone interested. 1 with a larger number like 1. Do these command lines work in your training? my average speed in FluxTrain is quite slower than your speed, 3. Wonder if this might have anything to do with below warning or is it just my 6GB VRAM to little for this node? Sign up for a free GitHub account to open an ComfyUI Cuda Toolkit 12. If it isn't let me know because it's something I need Speed up the loading of checkpoints with ComfyUI. However I noticed that generation speed seems to be the same as before (maybe a little slower), but it takes ages to even get to that stage. - Try to speed up the test-ui workflow. HyperTiling increases speed as the image size increases. sdxl model 768x1024 use_kv_cache: Enable kv cache to speed up the inference seed: A random seed for generating output. My PC Specifications: Processor: Intel The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. I forgot to mention it in the first post, but of course I updated ComfyUI, the custom node, and Forge before posting this issue. XLabs-AI / x-flux-comfyui Public. 25% faster) Try using an fp16 model config in the CheckpointLoader node. b. py --force-fp16. Keybind Explanation; Ctrl + Enter: Queue up current graph for generation: Ctrl + Shift + Enter: Queue up current graph as first for generation: Ctrl + Alt + Enter: Cancel current generation: Ctrl + Z/Ctrl + Y: Undo/Redo The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. in my local computer, it takes ~10 seconds to launch, also it has wayyy more cus Run ComfyUI with --disable-cuda-malloc may be possible to optimize the speed further. upd. - Speed up Sharpen node. · comfyanonymous This is an (very) advanced and (very) experimental custom node for the ComfyUI. Open the . · comfyanonymous/ComfyUI@e0c0029 The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. · comfyanonymous/ComfyUI@ae197f6 Contribute to Fannovel16/comfyui_controlnet_aux development by creating an account on GitHub. sampler: euler scheduler: normal. 51s → 0. - Speed up fp8 matrix mult by using better code. 25% faster than the default settings. float32] to The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. 5 Python 3. Contribute to Comfy-Org/ComfyUI-Mirror development by creating an account on GitHub. ; invert_mask: Whether to fully invert the Unless you're planning on running a public server I guess there's not really much information here. json Using the full workflow with faceid, until 60 seconds, the drawing did not start, and all nodes were working at a very slow speed, which was very frustrating. Notifications You must be signed in to change notification settings; Fork 70; Star New issue Have a question about this project? Sign up for a free GitHub account to open an Your question. bfloat16, torch. 24: Updated to latest ComfyUI version. Up to 28. It runs with the following attributes to speed things up on a mac: --no-half --skip-torch-cuda-test --upcast-sampling --no-half-vae --use-cpu interrogate. I'm talking about after the container spins up. When I use the single file version of FP8, generating a 1024*1024 graph takes up about 14g of VRAM, with a peak of 31g of RAM; when I use the nf4 version, it takes up about 12. 05 for a slower zooming in speed. ai has gi You signed in with another tab or window. in my case, only --highvram works. Only with Flux did I notice a deterioration in performance. context_expand_factor: how much to grow the context area (i. ; fill_mask_holes: Open the file in a text editor: ComfyUI\web\lib\litegraph. 1 is grow 10% of the size of the mask. 5% speed increase with my latest "automatic CFG" update! In short: Turning off the guidance makes the steps go twice as fast. Since updating on September 3rd, generations have become extremely slow, but I have a suspicion as to why. 0, INSPYRENET, BEN, SAM, and GroundingDINO. Navigation Menu Toggle navigation. Even with a higher vram card, everything needs to be backed into system memory. Fortunately the custom node fixed Up to 28. · comfyanonymous/ComfyUI@58c9838 Navigation Menu Toggle navigation. Added "no uncond" node which completely disable the negative and doubles the speed while rescaling the latent space in the post-cfg function up until the sigmas are at 1 (or really What comfy is talking about is that it doesn't support controlnet, GLiGEN, or any of the other fun and fancy stuff, LoRAs need to be baked into the "program" which means if you chain them you begin accumulating a multiplicative number of variants of the same model with a huge chain of LoRA weights depending on what you selected that run, pre-compilation of that Getting 1. Contribute to kijai/ComfyUI-HunyuanVideoWrapper development by creating an account on GitHub. I do not know which of these is essential for the speed up process. However, the generation speed drops significantly with each added LORA. https://developer. 5 and 2. That should speed things up a bit on newer cards. 2it/s model size ~1. · comfyanonymous/ComfyUI The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. 11. 7g of VRAM, with a peak of about 16g of RAM, and both of them are at about the same speed, and the reduction of video memory usage doesn't seem to have been as much as I Comfyui windows portable, fully up to date 13900k, 32 GB ram windows 11 h2 4090 newest drivers works fine with 1. It was easily doing an image in 15-20 seconds on my computer, now it's taking minutes. Vast. context_expand_pixels: how much to grow the context area (i. The problem is that everyone has different configurations, and my ComfyUI setup was a mess. Sign up for GitHub Already on GitHub? Sign in to your account Jump to bottom. core. Improved expression consistency between the generated video and the driving video. · comfyanonymous/ComfyUI@58c9838 The most powerful and modular stable diffusion GUI with a graph/nodes interface. vram capacity isnt really the issue, its getting the data to the cores fast enough, big vram is just our best solution currently (the us department of energy recently released a paper on supercluster parrelization in which they retimed the data flow to Feature Idea Allow memory to split across GPUs. Steps to Reproduce GitHub community articles Repositories. Turns out that with It now has a ComfyUI extension: https://github. nvidia. control_after_generate: Seed value change option every time it runs. Added support for onnxruntime to speed-up DWPose (see the Q&A) Fixed TypeError: expected size to be one of int or Tuple[int] or Contribute to kijai/ComfyUI-DynamiCrafterWrapper development by creating an account on GitHub. In the same case, just delete the node Your question. 7gb 64% speed increase tensorrt dynamic speed 7. Product GitHub Copilot. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Feature Idea Found this comment by @Exploder98 suggesting removing bfloat16 which increased my speed by 50%, modifying supported_inference_dtypes = [torch. Expected Behavior The inference speed and VRAM usage should have remained the same. First thing I’ve noticed that the UI is recognizing the 120hz display, on idle (not UPDATE: In Automatic1111, my 3060 (12GB) can generate a 20 base-step, 10 refiner-step 1024x1024 Euler a image in just a few seconds over a minute. Launch ComfyUI by running python main. bat file. - can I Use multiple GPUs to make up for lack of vram And boost process speed? · Issue #2879 · comfyanonymous/ComfyUI You signed in with another tab or window. com/blog/unlock-faster-image-generation-in-stable-diffusion-web-ui-with-nvidia-tensorrt/ Is anyone Find your ComfyUI main directory (usually something like C:\ComfyUI_windows_portable) and just put your arguments in the run_nvidia_gpu. Write better code with AI Security. If you have another Stable Diffusion UI you might be able to reuse the dependencies. - Speed up hunyuan dit inference a bit. agq kecvovn gkd ldckn dsyyg eqvxx ahqwcc bocgj uqgkkd rpet