Skip to content

Commit 743adea

Browse files
james77777778divyashreepathihallisachinprasadhsmattdangerwsineeli
authored
Add StableDiffusion3 (#1820)
* Add VGG16 backbone (#1737) * Agg Vgg16 backbone * update names * update tests * update test * add image classifier * incorporate review comments * Update test case * update backbone test * add image classifier * classifier cleanup * code reformat * add vgg16 image classifier * make vgg generic * update doc string * update docstring * add classifier test * update tests * update docstring * address review comments * code reformat * update the configs * address review comments * fix task saved model test * update init * code reformatted * Add `ResNetBackbone` and `ResNetImageClassifier` (#1765) * Add ResNetV1 and ResNetV2 * Address comments * Add CSP DarkNet backbone and classifier (#1774) * Add CSP DarkNet * Add CSP DarkNet * snake_case function names * change use_depthwise to block_type * Add `FeaturePyramidBackbone` and port weights from `timm` for `ResNetBackbone` (#1769) * Add FeaturePyramidBackbone and update ResNetBackbone * Simplify the implementation * Fix CI * Make ResNetBackbone compatible with timm and add FeaturePyramidBackbone * Add conversion implementation * Update docstrings * Address comments * Add DenseNet (#1775) * Add DenseNet * fix testcase * address comments * nit * fix lint errors * move description * Add ViTDetBackbone (#1776) * add vit det vit_det_backbone * update docstring * code reformat * fix tests * address review comments * bump year on all files * address review comments * rename backbone * fix tests * change back to ViT * address review comments * update image shape * Add Mix transformer (#1780) * Add MixTransformer * fix testcase * test changes and comments * lint fix * update config list * modify testcase for 2 layers * update input_image_shape -> image_shape (#1785) * update input_image_shape -> image_shape * update docstring example * code reformat * update tests * Create __init__.py (#1788) add missing __init__ file to vit_det * Hack package build script to rename to keras-hub (#1793) This is a temporary way to test out the keras-hub branch. - Does a global rename of all symbols during package build. - Registers the "old" name on symbol export for saving compat. - Adds a github action to publish every commit to keras-hub as a new package. - Removes our descriptions on PyPI temporarily, until we want to message this more broadly. * Add CLIP and T5XXL for StableDiffusionV3 (#1790) * Add `CLIPTokenizer`, `T5XXLTokenizer`, `CLIPTextEncoder` and `T5XXLTextEncoder`. * Make CLIPTextEncoder as Backbone * Add `T5XXLPreprocessor` and remove `T5XXLTokenizer` Add `CLIPPreprocessor` * Use `tf = None` at the top * Replace manual implementation of `CLIPAttention` with `MultiHeadAttention` * Add Bounding Box Utils (#1791) * Bounding box utils * - Correct test cases * - Remove hard tensorflow dtype * - fix api gen * - Fix import for test cases - Use setup for converters test case * - fix api_gen issue * - FIx api gen * - Fix api gen error * - Correct test cases as per new api changes * mobilenet_v3 added in keras-nlp (#1782) * mobilenet_v3 added in keras-nlp * minor bug fixed in mobilenet_v3_backbone * formatting corrected * refactoring backbone * correct_pad_downsample method added * refactoring backbone * parameters updated * Testcaseupdated, expected output shape corrected * code formatted with black * testcase updated * refactoring and description added * comments updated * added mobilenet v1 and v2 * merge conflict resolved * version arg removed, and config options added * input_shape changed to image_shape in arg * config updated * input shape corrected * comments resolved * activation function format changed * minor bug fixed * minor bug fixed * added vision_backbone_test * channel_first bug resolved * channel_first cases working * comments resolved * formatting fixed * refactoring --------- Co-authored-by: ushareng <[email protected]> * Pkgoogle/efficient net migration (#1778) * migrating efficientnet models to keras-hub * merging changes from other sources * autoformatting pass * initial consolidation of efficientnet_backbone * most updates and removing separate implementation * cleanup, autoformatting, keras generalization * removed layer examples outside of effiicient net * many, mainly documentation changes, small test fixes * Add the ResNet_vd backbone (#1766) * Add ResNet_vd to ResNet backbone * Addressed requested parameter changes * Fixed tests and updated comments * Added new parameters to docstring * Add `VAEImageDecoder` for StableDiffusionV3 (#1796) * Add `VAEImageDecoder` for StableDiffusionV3 * Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention` * Replace `Backbone` with `keras.Model` in `CLIPTextEncoder` and `T5XXLTextEncoder` (#1802) * Add pyramid output for densenet, cspDarknet (#1801) * add pyramid outputs * fix testcase * format fix * make common testcase for pyramid outputs * change default shape * simplify testcase * test case change and add channel axis * Add `MMDiT` for StableDiffusionV3 (#1806) * Add `MMDiT` * Update * Update * Update implementation * Add remaining bbox utils (#1804) * - Add formats, iou, utils for bounding box * - Add `AnchorGenerator`, `BoxMatcher` and `NonMaxSupression` layers * - Remove scope_name not required. * use default keras name scope * - Correct format error * - Remove layers as of now and keep them at model level till keras core supports them * - Correct api_gen * Fix timm conversion for rersnet (#1814) * Add `StableDiffusion3` * Fix `_normalize_inputs` * Separate CLIP encoders from SD3 backbone. * Simplify `text_to_image` function. * Address comments * Minor update and add docstrings. * Add VGG16 backbone (#1737) * Agg Vgg16 backbone * update names * update tests * update test * add image classifier * incorporate review comments * Update test case * update backbone test * add image classifier * classifier cleanup * code reformat * add vgg16 image classifier * make vgg generic * update doc string * update docstring * add classifier test * update tests * update docstring * address review comments * code reformat * update the configs * address review comments * fix task saved model test * update init * code reformatted * Add `ResNetBackbone` and `ResNetImageClassifier` (#1765) * Add ResNetV1 and ResNetV2 * Address comments * Add CSP DarkNet backbone and classifier (#1774) * Add CSP DarkNet * Add CSP DarkNet * snake_case function names * change use_depthwise to block_type * Add `FeaturePyramidBackbone` and port weights from `timm` for `ResNetBackbone` (#1769) * Add FeaturePyramidBackbone and update ResNetBackbone * Simplify the implementation * Fix CI * Make ResNetBackbone compatible with timm and add FeaturePyramidBackbone * Add conversion implementation * Update docstrings * Address comments * Add DenseNet (#1775) * Add DenseNet * fix testcase * address comments * nit * fix lint errors * move description * Add ViTDetBackbone (#1776) * add vit det vit_det_backbone * update docstring * code reformat * fix tests * address review comments * bump year on all files * address review comments * rename backbone * fix tests * change back to ViT * address review comments * update image shape * Add Mix transformer (#1780) * Add MixTransformer * fix testcase * test changes and comments * lint fix * update config list * modify testcase for 2 layers * update input_image_shape -> image_shape (#1785) * update input_image_shape -> image_shape * update docstring example * code reformat * update tests * Create __init__.py (#1788) add missing __init__ file to vit_det * Hack package build script to rename to keras-hub (#1793) This is a temporary way to test out the keras-hub branch. - Does a global rename of all symbols during package build. - Registers the "old" name on symbol export for saving compat. - Adds a github action to publish every commit to keras-hub as a new package. - Removes our descriptions on PyPI temporarily, until we want to message this more broadly. * Add CLIP and T5XXL for StableDiffusionV3 (#1790) * Add `CLIPTokenizer`, `T5XXLTokenizer`, `CLIPTextEncoder` and `T5XXLTextEncoder`. * Make CLIPTextEncoder as Backbone * Add `T5XXLPreprocessor` and remove `T5XXLTokenizer` Add `CLIPPreprocessor` * Use `tf = None` at the top * Replace manual implementation of `CLIPAttention` with `MultiHeadAttention` * Add Bounding Box Utils (#1791) * Bounding box utils * - Correct test cases * - Remove hard tensorflow dtype * - fix api gen * - Fix import for test cases - Use setup for converters test case * - fix api_gen issue * - FIx api gen * - Fix api gen error * - Correct test cases as per new api changes * mobilenet_v3 added in keras-nlp (#1782) * mobilenet_v3 added in keras-nlp * minor bug fixed in mobilenet_v3_backbone * formatting corrected * refactoring backbone * correct_pad_downsample method added * refactoring backbone * parameters updated * Testcaseupdated, expected output shape corrected * code formatted with black * testcase updated * refactoring and description added * comments updated * added mobilenet v1 and v2 * merge conflict resolved * version arg removed, and config options added * input_shape changed to image_shape in arg * config updated * input shape corrected * comments resolved * activation function format changed * minor bug fixed * minor bug fixed * added vision_backbone_test * channel_first bug resolved * channel_first cases working * comments resolved * formatting fixed * refactoring --------- Co-authored-by: ushareng <[email protected]> * Pkgoogle/efficient net migration (#1778) * migrating efficientnet models to keras-hub * merging changes from other sources * autoformatting pass * initial consolidation of efficientnet_backbone * most updates and removing separate implementation * cleanup, autoformatting, keras generalization * removed layer examples outside of effiicient net * many, mainly documentation changes, small test fixes * Add the ResNet_vd backbone (#1766) * Add ResNet_vd to ResNet backbone * Addressed requested parameter changes * Fixed tests and updated comments * Added new parameters to docstring * Add `VAEImageDecoder` for StableDiffusionV3 (#1796) * Add `VAEImageDecoder` for StableDiffusionV3 * Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention` * Replace `Backbone` with `keras.Model` in `CLIPTextEncoder` and `T5XXLTextEncoder` (#1802) * Add pyramid output for densenet, cspDarknet (#1801) * add pyramid outputs * fix testcase * format fix * make common testcase for pyramid outputs * change default shape * simplify testcase * test case change and add channel axis * Add `MMDiT` for StableDiffusionV3 (#1806) * Add `MMDiT` * Update * Update * Update implementation * Add remaining bbox utils (#1804) * - Add formats, iou, utils for bounding box * - Add `AnchorGenerator`, `BoxMatcher` and `NonMaxSupression` layers * - Remove scope_name not required. * use default keras name scope * - Correct format error * - Remove layers as of now and keep them at model level till keras core supports them * - Correct api_gen * Fix timm conversion for rersnet (#1814) * Fix * Update * Rename to diffuser and decoder * Define functional model * Merge from upstream/master * Delete old SD3 * Fix copyright * Rename to keras_hub * Address comments * Update * Fix CI * Fix bugs occurred in keras3.1 --------- Co-authored-by: Divyashree Sreepathihalli <[email protected]> Co-authored-by: Sachin Prasad <[email protected]> Co-authored-by: Matt Watson <[email protected]> Co-authored-by: Siva Sravana Kumar Neeli <[email protected]> Co-authored-by: Usha Rengaraju <[email protected]> Co-authored-by: ushareng <[email protected]> Co-authored-by: pkgoogle <[email protected]> Co-authored-by: gowthamkpr <[email protected]>
1 parent 3fbbeea commit 743adea

27 files changed

+2582
-870
lines changed

keras_hub/api/models/__init__.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@
6666
from keras_hub.src.models.bloom.bloom_tokenizer import BloomTokenizer
6767
from keras_hub.src.models.causal_lm import CausalLM
6868
from keras_hub.src.models.causal_lm_preprocessor import CausalLMPreprocessor
69+
from keras_hub.src.models.clip.clip_preprocessor import CLIPPreprocessor
70+
from keras_hub.src.models.clip.clip_tokenizer import CLIPTokenizer
6971
from keras_hub.src.models.csp_darknet.csp_darknet_backbone import (
7072
CSPDarkNetBackbone,
7173
)
@@ -260,14 +262,25 @@
260262
from keras_hub.src.models.sam.sam_image_segmenter import SAMImageSegmenter
261263
from keras_hub.src.models.seq_2_seq_lm import Seq2SeqLM
262264
from keras_hub.src.models.seq_2_seq_lm_preprocessor import Seq2SeqLMPreprocessor
265+
from keras_hub.src.models.stable_diffusion_3.stable_diffusion_3_backbone import (
266+
StableDiffusion3Backbone,
267+
)
268+
from keras_hub.src.models.stable_diffusion_3.stable_diffusion_3_text_to_image import (
269+
StableDiffusion3TextToImage,
270+
)
271+
from keras_hub.src.models.stable_diffusion_3.stable_diffusion_3_text_to_image_preprocessor import (
272+
StableDiffusion3TextToImagePreprocessor,
273+
)
263274
from keras_hub.src.models.t5.t5_backbone import T5Backbone
275+
from keras_hub.src.models.t5.t5_preprocessor import T5Preprocessor
264276
from keras_hub.src.models.t5.t5_tokenizer import T5Tokenizer
265277
from keras_hub.src.models.task import Task
266278
from keras_hub.src.models.text_classifier import TextClassifier
267279
from keras_hub.src.models.text_classifier import TextClassifier as Classifier
268280
from keras_hub.src.models.text_classifier_preprocessor import (
269281
TextClassifierPreprocessor,
270282
)
283+
from keras_hub.src.models.text_to_image import TextToImage
271284
from keras_hub.src.models.vgg.vgg_backbone import VGGBackbone
272285
from keras_hub.src.models.vgg.vgg_image_classifier import VGGImageClassifier
273286
from keras_hub.src.models.vit_det.vit_det_backbone import ViTDetBackbone

keras_hub/api/tokenizers/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
from keras_hub.src.models.bart.bart_tokenizer import BartTokenizer
2222
from keras_hub.src.models.bert.bert_tokenizer import BertTokenizer
2323
from keras_hub.src.models.bloom.bloom_tokenizer import BloomTokenizer
24+
from keras_hub.src.models.clip.clip_tokenizer import CLIPTokenizer
2425
from keras_hub.src.models.deberta_v3.deberta_v3_tokenizer import (
2526
DebertaV3Tokenizer,
2627
)

keras_hub/src/models/stable_diffusion_v3/clip_encoder_block.py renamed to keras_hub/src/models/clip/clip_encoder_block.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
14+
from keras import dtype_policies
1415
from keras import layers
1516
from keras import ops
1617

@@ -43,7 +44,7 @@ def __init__(
4344
intermediate_activation = quick_gelu
4445

4546
self.layer_norm_1 = layers.LayerNormalization(
46-
epsilon=0.00001, dtype=self.dtype_policy, name="layer_norm_1"
47+
epsilon=1e-5, dtype="float32", name="layer_norm_1"
4748
)
4849
self.attention = layers.MultiHeadAttention(
4950
num_heads,
@@ -52,7 +53,7 @@ def __init__(
5253
name="attention",
5354
)
5455
self.layer_norm_2 = layers.LayerNormalization(
55-
epsilon=0.00001, dtype=self.dtype_policy, name="layer_norm_2"
56+
epsilon=1e-5, dtype="float32", name="layer_norm_2"
5657
)
5758
self.dense_1 = layers.Dense(
5859
self.intermediate_dim, dtype=self.dtype_policy, name="dense_1"
@@ -67,6 +68,11 @@ def __init__(
6768
def build(self, input_shape):
6869
self.layer_norm_1.build(input_shape)
6970
self.attention.build(input_shape, input_shape, input_shape)
71+
# Before Keras 3.2, there was no setter for `dtype_policy`. Directly
72+
# assign a `DTypePolicy` instead.
73+
self.attention._softmax.dtype_policy = dtype_policies.DTypePolicy(
74+
"float32"
75+
)
7076
self.layer_norm_2.build(input_shape)
7177
self.dense_1.build(input_shape)
7278
input_shape = self.dense_1.compute_output_shape(input_shape)
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# Copyright 2024 The KerasHub Authors
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
import keras
15+
16+
from keras_hub.src.api_export import keras_hub_export
17+
from keras_hub.src.layers.preprocessing.start_end_packer import StartEndPacker
18+
from keras_hub.src.models.clip.clip_tokenizer import CLIPTokenizer
19+
from keras_hub.src.models.preprocessor import Preprocessor
20+
from keras_hub.src.utils.tensor_utils import preprocessing_function
21+
22+
try:
23+
import tensorflow as tf
24+
except ImportError:
25+
tf = None
26+
27+
28+
@keras_hub_export("keras_hub.models.CLIPPreprocessor")
29+
class CLIPPreprocessor(Preprocessor):
30+
"""CLIP preprocessing layer which tokenizes and packs inputs.
31+
32+
This preprocessing layer will do 2 things:
33+
34+
- Tokenize the inputs using the `tokenizer`.
35+
- Construct a dictionary with keys `"token_ids"`, `"padding_mask"`.
36+
37+
This layer can be used directly with `tf.data.Dataset.map` to preprocess
38+
string data in the `(x, y, sample_weight)` format used by
39+
`keras.Model.fit`.
40+
41+
The call method of this layer accepts three arguments, `x`, `y`, and
42+
`sample_weight`. `x` can be a python string or tensor representing a single
43+
segment, a list of python strings representing a batch of single segments,
44+
or a list of tensors representing multiple segments to be packed together.
45+
`y` and `sample_weight` are both optional, can have any format, and will be
46+
passed through unaltered.
47+
48+
`CLIPPreprocessor` forces the input to have only one segment, as CLIP is
49+
mainly used for generation tasks. For tasks having multi-segment inputs
50+
like "glue/mnli", please use a model designed for classification purposes
51+
such as BERT or RoBERTa.
52+
53+
Args:
54+
tokenizer: A `keras_hub.models.CLIPTokenizer` instance.
55+
sequence_length: The length of the packed inputs.
56+
add_start_token: If `True`, the preprocessor will prepend the tokenizer
57+
start token to each input sequence.
58+
add_end_token: If `True`, the preprocessor will append the tokenizer
59+
end token to each input sequence.
60+
to_lower: bool. Whether to lower the inputs.
61+
62+
Call arguments:
63+
x: A string, `tf.Tensor` or list of python strings.
64+
y: Any label data. Will be passed through unaltered.
65+
sample_weight: Any label weight data. Will be passed through unaltered.
66+
sequence_length: Pass to override the configured `sequence_length` of
67+
the layer.
68+
"""
69+
70+
# TODO: Add example once we have a CLIP model.
71+
72+
tokenizer_cls = CLIPTokenizer
73+
74+
def __init__(
75+
self,
76+
tokenizer,
77+
sequence_length=77,
78+
add_start_token=True,
79+
add_end_token=True,
80+
to_lower=True,
81+
**kwargs,
82+
):
83+
super().__init__(**kwargs)
84+
self.tokenizer = tokenizer
85+
self.packer = None
86+
self.sequence_length = sequence_length
87+
self.add_start_token = add_start_token
88+
self.add_end_token = add_end_token
89+
self.to_lower = to_lower
90+
91+
def build(self, input_shape):
92+
# Defer packer creation to `build()` so that we can be sure tokenizer
93+
# assets have loaded when restoring a saved model.
94+
self.packer = StartEndPacker(
95+
start_value=self.tokenizer.start_token_id,
96+
end_value=self.tokenizer.end_token_id,
97+
pad_value=self.tokenizer.end_token_id,
98+
sequence_length=self.sequence_length,
99+
return_padding_mask=True,
100+
)
101+
self.built = True
102+
103+
@preprocessing_function
104+
def call(
105+
self,
106+
x,
107+
y=None,
108+
sample_weight=None,
109+
sequence_length=None,
110+
):
111+
sequence_length = sequence_length or self.sequence_length
112+
if self.to_lower:
113+
x = tf.strings.lower(x)
114+
token_ids, padding_mask = self.packer(
115+
self.tokenizer(x),
116+
sequence_length=sequence_length,
117+
add_start_value=self.add_start_token,
118+
add_end_value=self.add_end_token,
119+
)
120+
x = {
121+
"token_ids": token_ids,
122+
"padding_mask": padding_mask,
123+
}
124+
return keras.utils.pack_x_y_sample_weight(x, y, sample_weight)
125+
126+
def get_config(self):
127+
config = super().get_config()
128+
config.update(
129+
{
130+
"sequence_length": self.sequence_length,
131+
"add_start_token": self.add_start_token,
132+
"add_end_token": self.add_end_token,
133+
"to_lower": self.to_lower,
134+
}
135+
)
136+
return config
137+
138+
@property
139+
def sequence_length(self):
140+
"""The padded length of model input sequences."""
141+
return self._sequence_length
142+
143+
@sequence_length.setter
144+
def sequence_length(self, value):
145+
self._sequence_length = value
146+
if self.packer is not None:
147+
self.packer.sequence_length = value

keras_hub/src/models/stable_diffusion_v3/clip_preprocessor_test.py renamed to keras_hub/src/models/clip/clip_preprocessor_test.py

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,8 @@
1313
# limitations under the License.
1414
import pytest
1515

16-
from keras_hub.src.models.stable_diffusion_v3.clip_preprocessor import (
17-
CLIPPreprocessor,
18-
)
19-
from keras_hub.src.models.stable_diffusion_v3.clip_tokenizer import (
20-
CLIPTokenizer,
21-
)
16+
from keras_hub.src.models.clip.clip_preprocessor import CLIPPreprocessor
17+
from keras_hub.src.models.clip.clip_tokenizer import CLIPTokenizer
2218
from keras_hub.src.tests.test_case import TestCase
2319

2420

@@ -43,7 +39,7 @@ def test_preprocessor_basics(self):
4339
input_data=self.input_data,
4440
expected_output={
4541
"token_ids": [[5, 1, 2, 1, 3, 4, 4, 4]],
46-
"padding_mask": [[1, 1, 1, 1, 1, 0, 0, 0]],
42+
"padding_mask": [[1, 1, 1, 1, 1, 1, 0, 0]],
4743
},
4844
)
4945

@@ -54,17 +50,16 @@ def test_no_start_end_token(self):
5450
sequence_length=8,
5551
add_start_token=False,
5652
add_end_token=False,
57-
pad_with_end_token=False,
5853
)
5954
x = preprocessor(input_data)
60-
self.assertAllEqual(x["token_ids"], [[1, 2, 1, 3, 0, 0, 0, 0]] * 4)
55+
self.assertAllEqual(x["token_ids"], [[1, 2, 1, 3, 4, 4, 4, 4]] * 4)
6156
self.assertAllEqual(x["padding_mask"], [[1, 1, 1, 1, 0, 0, 0, 0]] * 4)
6257

6358
def test_sequence_length_override(self):
6459
input_data = " airplane airport"
6560
preprocessor = CLIPPreprocessor(**self.init_kwargs)
66-
x = preprocessor(input_data, sequence_length=4)
67-
self.assertAllEqual(x["token_ids"], [5, 1, 2, 1])
61+
x = preprocessor(input_data, sequence_length=5)
62+
self.assertAllEqual(x["token_ids"], [5, 1, 2, 1, 4])
6863

6964
@pytest.mark.kaggle_key_required
7065
@pytest.mark.extra_large

0 commit comments

Comments
 (0)