Databricks Repos (Chapter 3)

Databricks Repos is a visual Git client in Databricks. It supports common Git operations such a cloning a repository, committing and pushing, pulling, branch management, and visual comparison of diffs when committing.



Summary :

1. Clone the target repository
2. Create a new development branch or select an existing branch agreed upon by the team in which development work is conducted.
3. It is always good to create feature branch and not develop directly on the main branch.
4. Make changes(add items, modify code) and test.
5. Commit and push the changes  to your branch
6. Finally merge the changes into main(in accordance with organization's code review and release process)

Databricks supports the following Git providers: GitHub, Bitbucket Cloud,GitLab, Azure DevOps, AWS CodeCommit, GitHub AE

For following tasks, work in your Git provider:

  1. Create a pull request.
  2. Resolve merge conflicts.
  3. Merge or delete branches.
  4. Rebase a branch.
                                         


On the Github page :
1. Configure Git Integration in our workspace and have a repo ready.
2. Connect databricks to Git using username and PAT which is the personal access token 
3. Go to the settings page on the Github and click on the developer settings, click PAT, generate new token 
4. Mention in the Note and Expiration period, select scope as Repo and create it by clicking Generate token.
5. Copy the alphanumeric text and save in a temporary secure folder.

On the Databricks workspace :
1. Go to User setting--  GIT Integration tab, provide git provider and credentials and save.

On the Github landing page:
1. create a repo
2. Copy the URL from Code

On the Databricks workspace :
1. Go to repos in the sidebar
2. Add repo -- specify the URL copied earlier and create
3. The repo is now cloned and new repo will now contain a local copy of the upstream repo, navigate the repo and start developing before committing the changes.
4. To create a branch called dev click on the main branch , create the branch by specifying the name

5. Now start making changes in the dev branch by creating folder or whatever is necessary and then add a notebook (create or import notebook is possible)
6. commit all changes at once or any number of changes, provide summary and then commit and push
Note that pulling regularly is important to avoid merge conflicts specially when multiple developers are working on the same kind of changes.






Reference and Credit - Databricks.com

Comments