ContentLoader Import problem

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

ContentLoader Import problem

Jason E Bailey-2
We've set up a process to import content into our Sling instance and
we're running into a problem with the ContentLoader.
The use case is that a set of data that is managed in another part of
our company is being provided to us in the form of a JSON object for
loading into our Sling environment.
This works. However, this content changes daily, sometimes properties
will change and sometimes node structure.
We don't want to say "overwrite" because that causes the entire tree
structure to be deleted which is really intensive, however if we
don't say "overwrite" then nodes that are removed from the import
continue to exist.
Effectively what we need is a delta, we want to delete nodes if they
aren't in the import but otherwise leave it alone, the same thing with
properties.
Which, unless I'm missing something, is not a function the
importer supports. Has anyone had to deal with this? Maybe used a
different process?
Thanks
- Jason


Reply | Threaded
Open this post in threaded view
|

Re: ContentLoader Import problem

Eric Norman
Hi Jason,

I would think the ContentLoader could be enhanced to provide more granular
import logic than the "overwrite" and "overwriteProperties" directives
provide.

For a point of comparison, in a previous (non-sling) project I worked on we
had a similar mechanism for importing content into a taxonomy.  The
solution we ended up with for this kind of problem was to have a mechanism
to specify an import sync mode that changed how the new information was
interpreted.

For example, the import "sync mode" could be set to something like this
with a directive (or with a special tag within the content itself):

   - default - merge the new content into the existing taxonomy overwriting
   anything existing at the same location
   - update - merge the new content into the existing taxonomy by
   overwriting/updating existing content but don't create anything that
   doesn't already exist
   - add - merge the new content into the existing taxonomy but don't add
   or update any items that already exist
   - sync - same as "default" but remove all nodes from each of the parent
   nodes if there is no equivalent item in the new content.


Regards,
Eric

On Thu, Dec 6, 2018 at 10:44 AM Jason E Bailey <[hidden email]> wrote:

> We've set up a process to import content into our Sling instance and
> we're running into a problem with the ContentLoader.
> The use case is that a set of data that is managed in another part of
> our company is being provided to us in the form of a JSON object for
> loading into our Sling environment.
> This works. However, this content changes daily, sometimes properties
> will change and sometimes node structure.
> We don't want to say "overwrite" because that causes the entire tree
> structure to be deleted which is really intensive, however if we
> don't say "overwrite" then nodes that are removed from the import
> continue to exist.
> Effectively what we need is a delta, we want to delete nodes if they
> aren't in the import but otherwise leave it alone, the same thing with
> properties.
> Which, unless I'm missing something, is not a function the
> importer supports. Has anyone had to deal with this? Maybe used a
> different process?
> Thanks
> - Jason
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: ContentLoader Import problem

Jason E Bailey-2
That would be a great addition.  It may be hard to change the existing options as that could break downstream use cases but I'm sure there's ways of updating this. I took a look at the code and I'm not familiar enough with oak in this use case to make that change.



- Jason

On Thu, Dec 6, 2018, at 2:43 PM, Eric Norman wrote:

> Hi Jason,
>
> I would think the ContentLoader could be enhanced to provide more granular
> import logic than the "overwrite" and "overwriteProperties" directives
> provide.
>
> For a point of comparison, in a previous (non-sling) project I worked on we
> had a similar mechanism for importing content into a taxonomy.  The
> solution we ended up with for this kind of problem was to have a mechanism
> to specify an import sync mode that changed how the new information was
> interpreted.
>
> For example, the import "sync mode" could be set to something like this
> with a directive (or with a special tag within the content itself):
>
>    - default - merge the new content into the existing taxonomy overwriting
>    anything existing at the same location
>    - update - merge the new content into the existing taxonomy by
>    overwriting/updating existing content but don't create anything that
>    doesn't already exist
>    - add - merge the new content into the existing taxonomy but don't add
>    or update any items that already exist
>    - sync - same as "default" but remove all nodes from each of the parent
>    nodes if there is no equivalent item in the new content.
>
>
> Regards,
> Eric
>
> On Thu, Dec 6, 2018 at 10:44 AM Jason E Bailey <[hidden email]> wrote:
>
> > We've set up a process to import content into our Sling instance and
> > we're running into a problem with the ContentLoader.
> > The use case is that a set of data that is managed in another part of
> > our company is being provided to us in the form of a JSON object for
> > loading into our Sling environment.
> > This works. However, this content changes daily, sometimes properties
> > will change and sometimes node structure.
> > We don't want to say "overwrite" because that causes the entire tree
> > structure to be deleted which is really intensive, however if we
> > don't say "overwrite" then nodes that are removed from the import
> > continue to exist.
> > Effectively what we need is a delta, we want to delete nodes if they
> > aren't in the import but otherwise leave it alone, the same thing with
> > properties.
> > Which, unless I'm missing something, is not a function the
> > importer supports. Has anyone had to deal with this? Maybe used a
> > different process?
> > Thanks
> > - Jason
> >
> >
> >
Reply | Threaded
Open this post in threaded view
|

Re: ContentLoader Import problem

Eric Norman
Hi Jason,

Yes, I would expect that the original "overwrite" and "overwriteProperties"
directives could simply be deprecated and any existing usages in the wild
could be re-mapped to an equivalent "sync mode" by the runtime with a
warning message logged about the deprecation.  The available "sync mode"
values would just have to make sure they cover all the possible
combinations.  For example, the original "overwrite" directive could be a
"replace" sync mode where any existing content at the target path is
removed before processing the new content.

Perhaps it would be worthwhile and simpler to have different "sync mode"
values for the handling of nodes vs properties so those two types of items
could be handled differently when needed?

Regards,
Eric

On Fri, Dec 7, 2018 at 6:25 AM Jason E Bailey <[hidden email]> wrote:

> That would be a great addition.  It may be hard to change the existing
> options as that could break downstream use cases but I'm sure there's ways
> of updating this. I took a look at the code and I'm not familiar enough
> with oak in this use case to make that change.
>
>
>
> - Jason
>
> On Thu, Dec 6, 2018, at 2:43 PM, Eric Norman wrote:
> > Hi Jason,
> >
> > I would think the ContentLoader could be enhanced to provide more
> granular
> > import logic than the "overwrite" and "overwriteProperties" directives
> > provide.
> >
> > For a point of comparison, in a previous (non-sling) project I worked on
> we
> > had a similar mechanism for importing content into a taxonomy.  The
> > solution we ended up with for this kind of problem was to have a
> mechanism
> > to specify an import sync mode that changed how the new information was
> > interpreted.
> >
> > For example, the import "sync mode" could be set to something like this
> > with a directive (or with a special tag within the content itself):
> >
> >    - default - merge the new content into the existing taxonomy
> overwriting
> >    anything existing at the same location
> >    - update - merge the new content into the existing taxonomy by
> >    overwriting/updating existing content but don't create anything that
> >    doesn't already exist
> >    - add - merge the new content into the existing taxonomy but don't add
> >    or update any items that already exist
> >    - sync - same as "default" but remove all nodes from each of the
> parent
> >    nodes if there is no equivalent item in the new content.
> >
> >
> > Regards,
> > Eric
> >
> > On Thu, Dec 6, 2018 at 10:44 AM Jason E Bailey <[hidden email]> wrote:
> >
> > > We've set up a process to import content into our Sling instance and
> > > we're running into a problem with the ContentLoader.
> > > The use case is that a set of data that is managed in another part of
> > > our company is being provided to us in the form of a JSON object for
> > > loading into our Sling environment.
> > > This works. However, this content changes daily, sometimes properties
> > > will change and sometimes node structure.
> > > We don't want to say "overwrite" because that causes the entire tree
> > > structure to be deleted which is really intensive, however if we
> > > don't say "overwrite" then nodes that are removed from the import
> > > continue to exist.
> > > Effectively what we need is a delta, we want to delete nodes if they
> > > aren't in the import but otherwise leave it alone, the same thing with
> > > properties.
> > > Which, unless I'm missing something, is not a function the
> > > importer supports. Has anyone had to deal with this? Maybe used a
> > > different process?
> > > Thanks
> > > - Jason
> > >
> > >
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: ContentLoader Import problem

Jason E Bailey
The code base needs some work and isn't intuitive to work with. To keep compatibility the configurations for merging are extensions of overwrite. You must overwrite to then merge nodes. You must overwrite properties to be able to then merge properties.

I need to go back in and create new methods at some point with different names that better reflect available options and then I'd be able to deprecate the older ones

--
Jason

On Sat, Dec 8, 2018, at 1:39 PM, Eric Norman wrote:

> Hi Jason,
>
> Yes, I would expect that the original "overwrite" and "overwriteProperties"
> directives could simply be deprecated and any existing usages in the wild
> could be re-mapped to an equivalent "sync mode" by the runtime with a
> warning message logged about the deprecation.  The available "sync mode"
> values would just have to make sure they cover all the possible
> combinations.  For example, the original "overwrite" directive could be a
> "replace" sync mode where any existing content at the target path is
> removed before processing the new content.
>
> Perhaps it would be worthwhile and simpler to have different "sync mode"
> values for the handling of nodes vs properties so those two types of items
> could be handled differently when needed?
>
> Regards,
> Eric
>
> On Fri, Dec 7, 2018 at 6:25 AM Jason E Bailey <[hidden email]> wrote:
>
> > That would be a great addition.  It may be hard to change the existing
> > options as that could break downstream use cases but I'm sure there's ways
> > of updating this. I took a look at the code and I'm not familiar enough
> > with oak in this use case to make that change.
> >
> >
> >
> > - Jason
> >
> > On Thu, Dec 6, 2018, at 2:43 PM, Eric Norman wrote:
> > > Hi Jason,
> > >
> > > I would think the ContentLoader could be enhanced to provide more
> > granular
> > > import logic than the "overwrite" and "overwriteProperties" directives
> > > provide.
> > >
> > > For a point of comparison, in a previous (non-sling) project I worked on
> > we
> > > had a similar mechanism for importing content into a taxonomy.  The
> > > solution we ended up with for this kind of problem was to have a
> > mechanism
> > > to specify an import sync mode that changed how the new information was
> > > interpreted.
> > >
> > > For example, the import "sync mode" could be set to something like this
> > > with a directive (or with a special tag within the content itself):
> > >
> > >    - default - merge the new content into the existing taxonomy
> > overwriting
> > >    anything existing at the same location
> > >    - update - merge the new content into the existing taxonomy by
> > >    overwriting/updating existing content but don't create anything that
> > >    doesn't already exist
> > >    - add - merge the new content into the existing taxonomy but don't add
> > >    or update any items that already exist
> > >    - sync - same as "default" but remove all nodes from each of the
> > parent
> > >    nodes if there is no equivalent item in the new content.
> > >
> > >
> > > Regards,
> > > Eric
> > >
> > > On Thu, Dec 6, 2018 at 10:44 AM Jason E Bailey <[hidden email]> wrote:
> > >
> > > > We've set up a process to import content into our Sling instance and
> > > > we're running into a problem with the ContentLoader.
> > > > The use case is that a set of data that is managed in another part of
> > > > our company is being provided to us in the form of a JSON object for
> > > > loading into our Sling environment.
> > > > This works. However, this content changes daily, sometimes properties
> > > > will change and sometimes node structure.
> > > > We don't want to say "overwrite" because that causes the entire tree
> > > > structure to be deleted which is really intensive, however if we
> > > > don't say "overwrite" then nodes that are removed from the import
> > > > continue to exist.
> > > > Effectively what we need is a delta, we want to delete nodes if they
> > > > aren't in the import but otherwise leave it alone, the same thing with
> > > > properties.
> > > > Which, unless I'm missing something, is not a function the
> > > > importer supports. Has anyone had to deal with this? Maybe used a
> > > > different process?
> > > > Thanks
> > > > - Jason
> > > >
> > > >
> > > >
> >
>